Basic Background

Solvation free energy (or free energy of solvation) is generally defined as the energy required to dissolve a solute in a solvent. In other words, it is the difference in the free energy of the solute in vacuum and in the solvent phase. Solvation free energy is a fundamental thermodynamic quantity that helps to estimate various physicochemical properties of a solute. Accurately predicting solvation free energy is thus important to several research fields, including drug design, materials science, and environmental chemistry. There are many approaches for calculating solvation free energy, such as implicit and explicit solvent models, alchemical methods, and machine learning algorithms.

Inhere, we will estimate the solvation free energy of aniline molecule using (non-equillibrium) alchemical free energy method (the figure below). GROMACS-2021.7 (any other version should be fine) will be employed for the system set up and molecular dynamics simulations, whereas the analyses will be done with PMX. PMX also helps in hybrid structure and topology preparation as well as in the generation of efficient workflow while dealing with more complex problems, such as estimation of free energy change due to protein mutation, protein-ligand absolute/relative binding free energy, whose tutorials are covered here.

Prerequisite

Workflow

The workflow of the solvation free energy estimation is sketched in the following figure and explained below in detail.

If you are not interested in the details of various parameters, you can directly skip to the Simulations section.

The estimation of free energy difference using non-equilibrium alchemy requires an extensive (and only) sampling of the end states (λ = 0 and λ = 1). For this tutorial, these two end states the fully solvated aniline (λ = 0) and non-interacting aniline (λ = 1).

Both the end states use the same topology files; they are defined as λ=0/1 in "Free Energy" section of the MDP files. Note that mdp files for each step (em, eq_nvt, eq) are provided twice with suffix "_l0" and "_l1" for λ=0 and λ=1, respectively. Let's have a look into the "Free Energy" section of the em_l0.mdp file (input_file.zip).

FE_MDP.png

-- free-energy: The control for free energy calculation. yes/no : on/off.

-- init-lambda: The lambda value (0 or 1) to define the states. This keywork differentiate the two end states (solvated aniline: 0; non-interacting aniline: 1).

-- delta-lambda, sc-alpha, sc-sigma, sc-power, sc-couple: Not important here. These are explained later.

-- couple-moltype: The molecule type of aniline in the topology file. It is defined as "MOL".

-- couple-intramol: The way of treating intramolecular non-bonded interactions of the molecule defined in 'couple-moltype'. no: All intra-molecular non-bonded interactions are replaced by exclusions and explicit pair interactions. In this manner the decoupled state of the molecule corresponds to the proper vacuum state without periodicity effects. yes: The intra-molecular Van der Waals and Coulomb interactions are also turned on/off.

-- couple-lambda0: vdw-q: All interactions (van der Waals and electrostatic) are on at λ=0.

-- couple-lambda1: none: All interactions (van der Waals and electrostatic) are off at λ=1.

As you might have noticed, the only difference between em_l0.mdp and em_l1.mdp files is the value for the init-lambda keyword.

Next, both the end states are subjected to the following steps.

Let's have a look into the remaining free energy controlling parameters in ti_l0.mdp

TI_MDP.png

-- delta-lambda: The rate at which the transition is carried out from one state to the other (0 to 1). It's just the inverse of number of md steps. In our case, it's 1/100000 = 1e-5 (for 200 ps transition time and 2 fs time step).

-- sc-couple: Whether to apply the soft-core free energy interaction transformation to the Columbic interaction of a molecule.

-- sc-alpha, sc-sigma, sc-power: Various parameters for the soft-core transformation. As you might have noticed, these parameters are irrelevant with delta-lambda=0.

Great!! Now that we have an understanding of the workflow, let's jump into the water and get solvated :).

Simulations

Import essential modules

Some useful functions are defined.

Download the input files and create directory structure

System preparation, Energy minimization, Equilibrations

System Preparation: GROMACS editconf and solvate modules are used.

Energy Minimization: Again, standard grompp and mdrun modules are employed for tpr file generation and simulation run.

NVT Equilibration: 10 ps of nvt equilibration both the states might take 2-5 minutes in total. I guess we can easily afford waiting for 2-5 minutes.

NPT Equilibration: 6 ns of npt equilibration both the states might take a few hours in total. Hence, we will skip running the simulations here and use pre-simulated-files provided as a part of input files. (However, it is recommended to run the simulations by yourself.)

Non-equilibrium TI (transitions)

Copy traj.trr and tpr.tpr files from pre-simulated-files ("eq" directories) to your simulation directory.

Extract 100 frames from the last 5 ns of equilibration trajectory into 'transitions' directory. The files will be named as frame{1..100}.gro.

Create 100 tpr files using the frame{1..100}.gro files.

Now, we will run 200 short simulations (100 for each state), each of 200 ps length (non-equilibrium). Since this would again take quite a while, we'll skip this step (the next cell) and use the pre-simulated-files.

Gromacs writes dhdl.xvg files for free energy calculations, which are provided in pre-simulated-files and can be used by PMX for analyses.

In the above cell, the 200 short simulations are run serially, you can run them in parallel as well in a local cluster

Copy dhdl files from the pre-simulated-files to your simulation directory.

Analysis

The work value can be computed by integrating the dH/d$\lambda$ values from a dhdl.xvg file. We collect all the work values (100 from each forward and backward transitions) calculated from the dH/d$\lambda$ and estimate $\Delta G_{solv}$ from their distributions employing various estimators. The PMX tool is used here for the analyses. The next cell takes ~1 minute to complete.

Keywords arguments:- -fA & -fB: input: dhdl files from the Forward transitions (0-->1) and Backward transitions (1-->0), respectively;

-o: output: file containing results from various estimators such as Bennett Acceptance Ration (BAR), Crooks Gaussian Intersection (CGI) and Jarzynski equality;

-oA & -oB: output: files containing work values from forward and backward transitions;

-w: output: plot of Forward and Backward work distributions;

-t: input: temperature (K); -b: input: number of bootstrap samples

Let's have look into the work distribution generated in the previous step (Analysis/wplot.png). You're encouraged to check all the outputs in the "Analysis" folder.

As you might have noticed, there is a large overlap between the forward and backward work distributions. In such as scenario, all the three estimators (BAR, CGI, and Janzynski) provide similar $\Delta G$ values; you can check in the results.txt file.

BAR is also known to provide a better estimate of $\Delta G$ even in cases with a lesser (to no) work distributions overlap. The $\Delta G$ provided in the plot above is from BAR estimator, and the error is standard error obtained by bootstrapping of the work values.

Let's have a look at the BAR section of results.txt file.

Another quantity, known as the convergence estimate (BAR: Conv), is also reported. It's value fluctuate around 0 for a converged work distribution (i.e. a good work distribution overlap). A value close to 1 represents no work distribution overlap and hence a lack of convergence.

Finally, we estimated the -$\Delta G_{solv.}$ (desolvation free energy) of aniline using non-equilibrium alchemical calculation to be 22.44 $\pm$ 0.22 kJ/mol, which excellently matches with the experimentally reported value of 22.96 kJ/mol (Reference).

Further

Let's relax a bit and try to comprehend the potential of the tool we have now.

With knowledge of two end states at hand, one could accurately estimate the free energy difference between them. The two end states can be imagined in a way that provide us

Although we've been very thourough with the protocol, there are a few parameters we took for granted.

Got any question related to PMX? Ask at https://ask.bioexcel.eu/.

References and Further Reading