In this tutorial we will use pmx and Gromacs to set up alchemical free energy calculations for a relative free energy calculation study. We will further perform the simulations and estimate the actual ddG value.
The protein and ligand topologies for this case have already been generated and described in the study:
Gapsys, Perez-Benito et al. Large scale relative protein ligand binding affinities using non-equilibrium alchemy. 2020. Chemical Science
https://pubs.rsc.org/en/content/articlelanding/2020/SC/C9SC03754C
All the input/output files from this work can be found on the pmx github page:
https://github.com/deGrootLab/pmx/tree/master/protLig_benchmark
In the tutorial, we will use a pair of ligands (18625-1: Cl in ortho position and 18626-1: Cl in meta position) binding to the JNK1 kinase and estimate the relative free energy difference $\Delta\Delta G$ between these molecules. The experimentally measured double free energy difference for this edge is -3.22 kJ/mol.
For the calculations, we will construct the following thermodynamic cycle:
conda create -n PMX python=3.7
conda activate PMX
conda install numpy matplotlib scipy pip jupyterlab pandas rdkit
git clone https://github.com/deGrootLab/pmx
cd pmx
git checkout develop
pip install .
We will set up the calculations in a non-equilibrium framework.
Firstly, two 6 ns long equilibrium simulations will be performed in the physical states A and B. This will be done for the ligand solvated in water and protein-ligand complex.
Afterwards, we will extract snapshots from the equilibrium simulations and run 80 rapid (50 ps each) transitions in the directions A->B and A<-B.
During the transitions, we will record the work required for the ligand morphing. Finally, the work will be related to $\Delta G$ by applying Crooks fluctuation theorem.
The overall workflow consists of the following steps:
The workflow in sketched in the following figure. The protocol is followed for estimation of both $\Delta G_{L}$ and $\Delta G_{PL}$.
Download and unzip the input files.
import subprocess
subprocess.call('wget http://pmx.mpibpc.mpg.de/Tutorial2024/Coimbra_2024/tutorial_files.zip', shell=True) #Download input_files.zip
subprocess.call('unzip tutorial_files.zip', shell=True)
import essential modules
import pmx
from pmx.utils import create_folder
from pmx import gmx, ligand_alchemy, jobscript
import sys
import matplotlib.pyplot as plt
import numpy as np
%matplotlib inline
%pylab inline
from IPython.core.display import clear_output
import os,shutil
import re
import subprocess
import glob
import random
import pandas as pd
from AZtutorial import *
from IPython.display import Image
Here, we will define the paths to the protein and ligand topologies, as well as to the molecular dynamics parameter files (.mdp).
We will also set some main parameters: water model, salt concentration, number of simulation replicas to use.
# initialize the free energy environment object: it will store the main parameters for the calculations
fe = AZtutorial( )
# set the workpath
fe.workPath = 'workpath'
# set the path to the molecular dynamics parameter files
fe.mdpPath = 'input/mdppath'
# set the number of replicas (several repetitions of calculation are useful to obtain reliable statistics)
fe.replicas = 3
# provide the path to the protein structure and topology
fe.proteinPath = 'input/protein_amber'
# provide the path to the folder with ligand structures and topologies
fe.ligandPath = 'input/ligands'
# provide edges
fe.edges = [ ['18625-1','18626-1'] ]
# finally, let's prepare the overall free energy calculation directory structure
fe.prepareFreeEnergyDir( )
1a. Firstly, let's create a hybrid structure/topology for the two ligands forming an edge. For that we will establish a mapping of atoms to be morphed between the molecules.
# this command will map the atoms of all edges found in the 'fe' object
# bVerbose flag prints the output of the command
fe.atom_mapping(bVerbose=False)
1b. Secondly, we will construct a hybrid structure and topology based on the established mapping
fe.hybrid_structure_topology(bVerbose=False)
1c. Finally, we assemble the ligand and ligand+protein systems, i.e. create structures and topologies that will be used further in the step 2.
fe.assemble_systems( )
Prepare simulation boxes with solvent and ions. Subsequently energy minimize the systems, prepare equilibrium simulations and start the runs.
2a. Build boxes, solvate, add ions
fe.boxWaterIons( )
2b. Energy minimization.
Prepare the simulation and subsequently either
a) run locally (suitable for the tutorial purpose)
b) create jobscripts to submit to the cluster
fe.prepare_simulation( simType='em' )
Running energy minimization locally (e.g. on a laptop) will take 10-15 minutes for one edge.
You can use the command below for that.
If you do not want to wait that long and have an access to a cluster, skip this step and proceed to the next step.
fe.run_simulation_locally( simType='em', bProt=False, ntomp='8', bVerbose=True )
We can also submit the jobs to run on the cluster.
This, naturally, requires some knowledge of the cluster available, queueing system and some dependencies. Several jobscript parameters allow to define the environment on the cluster. If there are additional cluster specifics, the jobscripts can be easily modified.
Together with the jobscripts, a script submit.py
will be generated. This script can be used to simply submit all the jobs to the cluster.
# set several parameters
fe.JOBqueue = 'SGE'
fe.JOBsource = ['/etc/profile.d/modules.sh','/usr/local/gromacs/GMXRC2024']
fe.JOBmodules = ['shared','owl/intel-mpi-default','cuda91']
fe.JOBgpu = True
fe.JOBgmx = 'mdrun_threads'
# create the jobscripts
fe.prepare_jobscripts(simType='em')
2c. Once the energy minimization has finished, we can prepare equilibrium simulations.
fe.prepare_simulation( simType='eq_nvt', bProt=False)
fe.run_simulation_locally( simType='eq_nvt', bProt=False, ntomp='8', bVerbose=True )
# set several parameters
fe.JOBqueue = 'SGE'
fe.JOBsource = ['/etc/profile.d/modules.sh','/usr/local/gromacs/GMXRC2024']
fe.JOBmodules = ['shared','owl/intel-mpi-default','cuda91']
fe.JOBgpu = True
fe.JOBgmx = 'mdrun_threads'
# create the jobscripts
fe.prepare_jobscripts(simType='eq_nvt')
Again, prepare and submit the equilibrium simulation jobs to the cluster.
This time the calculation will take longer (we are performing 6 ns simulations).
On a node with a GPU and 8 CPUs simulations of ligand in water will take ~2 hours, ligand-protein complex will take ~10 hours.
fe.prepare_simulation( simType='eq', bProt=False)
fe.prepare_jobscripts(simType='eq')
To use the pre-calculated data, simply change the working directory of your tutorial object "fe" (command below). By default, all your calculations were performed in the "workpath" directory. The following command will set it to the "workpath_precalculated". We will also set the number of replicass to 1, because for the brevity of the current tutorial only one repetition of the calculation was performed.
Note, that in the folder "workpath_precalculated" only the final results, i.e. dhdl.xvg files, have been generated for all the replicas for the forwards/reverse directions for the solvated ligand and ligand-protein complex. The raw generated trajectories are only provided for two cases: 'water/stateA/run1' and 'water/stateB/run1'. This is simply due to the large size of the trajectories: all the simulation data would take more than 2 GB. This means that Step 3
can only be performed for this one ligand transition in water. However, this is not a concern for the final analysis in Step 4
, because the results from the transitions are present for all the cases considered in the tutorial.
fe.replicas = 1
fe.workPath = 'workpath_precalculated'
At this point we already have the trajectories for equilibrium ensembles of two ligand states in water and in complex with the protein.
3a.
From these simulations we will extract the snapshots and initiate rapid transitions from one state to the other.
fe.prepare_transitions( bGenTpr=True, bProt=False )
3b.
Prepare submission scripts for transitions.
Each transition is very short (50 ps), therefore, we can reduce the number of required CPUs and flood our cluster with many small jobs.
fe.JOBsimcpu = 2
fe.JOBqueue = 'SGE'
fe.JOBsource = ['/etc/profile.d/modules.sh','/usr/local/gromacs/GMXRC2024']
fe.JOBmodules = ['shared','owl/intel-mpi-default','cuda91']
fe.JOBgpu = True
fe.JOBgmx = 'mdrun_threads'
fe.prepare_jobscripts(simType='transitions', bProt=False)
4a.
Firstly, let's process the calculated work values and estimate free energy differences.
The pmx analyse
program will integrate the dhdl.xvg files to obtain the forward and reverse work values.
Subsequently, the Crooks fluctuation theorem and Jarzynski estimators will be applied to relate the work distributions to $\Delta G$
fe.run_analysis( bVerbose=True)
Let's have a look into one of plots (workpath_precalculated/edge_18625-1_18626-1/protein/analyse1/wplot.png) generated by the above analysis.
A nice overlap between the forward and reverse distributions ensures convergence of the free energy simulations.
4b.
Finally, we can collect all the results into one data structure and investigate the outcome.
Let's also set the number of replicas back to 3, because results from three repeats have been pre-calculated for this tutorial.
Here, we read the $\Delta G$ values and calculate $\Delta\Delta G$.
The calculated error reflects the statistical uncertainty and the variation between multiple calculation repeats.
fe.replicas = 3
fe.analysis_summary( )
fe.resultsAll
fe.resultsSummary
Our estimated double free energy difference for this edge is -5.02 $\pm$ 0.54 kJ/mol, which is in close agreement with the experimental value of -3.22 kJ/mol.