Introduction


In this tutorial we will use pmx and Gromacs to set up alchemical free energy calculations for a relative free energy calculation study. We will further perform the simulations and estimate the actual ddG value.

The protein and ligand topologies for this case have already been generated and described in the study:
Gapsys, Perez-Benito et al. Large scale relative protein ligand binding affinities using non-equilibrium alchemy. 2020. Chemical Science
https://pubs.rsc.org/en/content/articlelanding/2020/SC/C9SC03754C

All the input/output files from this work can be found on the pmx github page:
https://github.com/deGrootLab/pmx/tree/master/protLig_benchmark

In the tutorial, we will use a pair of ligands (18625-1: Cl in ortho position and 18626-1: Cl in meta position) binding to the JNK1 kinase and estimate the relative free energy difference $\Delta\Delta G$ between these molecules. The experimentally measured double free energy difference for this edge is -3.22 kJ/mol.

For the calculations, we will construct the following thermodynamic cycle:

Prerequisite


    conda create -n PMX python=3.7
    conda activate PMX
    conda install numpy matplotlib scipy pip jupyterlab pandas rdkit
    git clone https://github.com/deGrootLab/pmx
    cd pmx
    git checkout develop
    pip install .


Method


We will set up the calculations in a non-equilibrium framework.
Firstly, two 6 ns long equilibrium simulations will be performed in the physical states A and B. This will be done for the ligand solvated in water and protein-ligand complex.
Afterwards, we will extract snapshots from the equilibrium simulations and run 80 rapid (50 ps each) transitions in the directions A->B and A<-B.
During the transitions, we will record the work required for the ligand morphing. Finally, the work will be related to $\Delta G$ by applying Crooks fluctuation theorem.


Workflow

The overall workflow consists of the following steps:

  1. Initialize the working environment
  2. Prepare hybrid structures/topologies
  3. Prepare equilibrium simulations
  4. Prepare non-equilibrium transitions
  5. Analysis

The workflow in sketched in the following figure. The protocol is followed for estimation of both $\Delta G_{L}$ and $\Delta G_{PL}$.

Download and unzip the input files.

import essential modules


Step 0. Initialize working environment

Here, we will define the paths to the protein and ligand topologies, as well as to the molecular dynamics parameter files (.mdp).

We will also set some main parameters: water model, salt concentration, number of simulation replicas to use.

1a. Firstly, let's create a hybrid structure/topology for the two ligands forming an edge. For that we will establish a mapping of atoms to be morphed between the molecules.

1b. Secondly, we will construct a hybrid structure and topology based on the established mapping

1c. Finally, we assemble the ligand and ligand+protein systems, i.e. create structures and topologies that will be used further in the step 2.


Step 2. Prepare equilibrium simulations.

Prepare simulation boxes with solvent and ions. Subsequently energy minimize the systems, prepare equilibrium simulations and start the runs.

2a. Build boxes, solvate, add ions

2b. Energy minimization.
Prepare the simulation and subsequently either
a) run locally (suitable for the tutorial purpose)
b) create jobscripts to submit to the cluster

Running energy minimization locally (e.g. on a laptop) will take 10-15 minutes for one edge.
You can use the command below for that. If you do not want to wait that long and have an access to a cluster, skip this step and proceed to the next step.

We can also submit the jobs to run on the cluster.
This, naturally, requires some knowledge of the cluster available, queueing system and some dependencies. Several jobscript parameters allow to define the environment on the cluster. If there are additional cluster specifics, the jobscripts can be easily modified.

Together with the jobscripts, a script submit.py will be generated. This script can be used to simply submit all the jobs to the cluster.

2c. Once the energy minimization has finished, we can prepare equilibrium simulations.

Again, prepare and submit the equilibrium simulation jobs to the cluster.
This time the calculation will take longer (we are performing 6 ns simulations).
On a node with a GPU and 8 CPUs simulations of ligand in water will take ~2 hours, ligand-protein complex will take ~10 hours.

The calculations for this step (and for the following step 3) would take a while. Therefore, I have already performed the simulations in advance, so we do not need to wait.

To use the pre-calculated data, simply change the working directory of your tutorial object "fe" (command below). By default, all your calculations were performed in the "workpath" directory. The following command will set it to the "workpath_precalculated". We will also set the number of replicass to 1, because for the brevity of the current tutorial only one repetition of the calculation was performed.
Note, that in the folder "workpath_precalculated" only the final results, i.e. dhdl.xvg files, have been generated for all the replicas for the forwards/reverse directions for the solvated ligand and ligand-protein complex. The raw generated trajectories are only provided for two cases: 'water/stateA/run1' and 'water/stateB/run1'. This is simply due to the large size of the trajectories: all the simulation data would take more than 2 GB. This means that Step 3 can only be performed for this one ligand transition in water. However, this is not a concern for the final analysis in Step 4, because the results from the transitions are present for all the cases considered in the tutorial.


Step 3. Prepare non-equilibrium transitions

At this point we already have the trajectories for equilibrium ensembles of two ligand states in water and in complex with the protein.

3a. From these simulations we will extract the snapshots and initiate rapid transitions from one state to the other.

3b. Prepare submission scripts for transitions.
Each transition is very short (50 ps), therefore, we can reduce the number of required CPUs and flood our cluster with many small jobs.


Step 4. Analysis

4a. Firstly, let's process the calculated work values and estimate free energy differences.
The pmx analyse program will integrate the dhdl.xvg files to obtain the forward and reverse work values.
Subsequently, the Crooks fluctuation theorem and Jarzynski estimators will be applied to relate the work distributions to $\Delta G$

Let's have a look into one of plots (workpath_precalculated/edge_18625-1_18626-1/protein/analyse1/wplot.png) generated by the above analysis.
A nice overlap between the forward and reverse distributions ensures convergence of the free energy simulations.

4b. Finally, we can collect all the results into one data structure and investigate the outcome.
Let's also set the number of replicas back to 3, because results from three repeats have been pre-calculated for this tutorial.
Here, we read the $\Delta G$ values and calculate $\Delta\Delta G$.
The calculated error reflects the statistical uncertainty and the variation between multiple calculation repeats.

Our estimated double free energy difference for this edge is -5.02 $\pm$ 0.54 kJ/mol, which is in close agreement with the experimental value of -3.22 kJ/mol.