Analysis of molecular dynamics simulations

Overview

question Questions
  • Which analysis tools are available?

objectives Objectives
  • Learn which analysis tools are available.

  • Analyse a protein and discuss the meaning behind each analysis.

requirements Requirements

time Time estimation: 1 hour

level Level: Intermediate level level level

Supporting Materials

Introduction

Molecular dynamics simulations return highly complex data. The Cartesian positions of each atom of the system (thousands or even millions) are recorded at every time step of the trajectory; this may again be thousands to millions of steps in length. Therefore, some kind of further analysis is needed to extract useful information from the data.

In this tutorial, we illustrate some of the analytical tools able to investigate conformational changes by analysis of a typical short protein simulation, such as for CBH1.

There are other analysis tools available; you are encouraged to try these out too.

Agenda

In this tutorial, we will cover:

  1. Get data
  2. Analysis with BIO3D
    1. RMSD
    2. RMSF
    3. PCA
    4. Workflow vs. individual tools
  3. Further analysis

Get data

The data required can be generated by completing the NAMD simulation tutorial. Access it from your history. Alternatively, download the data from the Zenodo link provided.

hands_on Hands-on: Upload cellulose simulation trajectory

  1. Create a new history

    tip Tip: Creating a new history

    Click the new-history icon at the top of the history panel

    If the new-history is missing:

    1. Click on the galaxy-gear icon (History options) on the top of the history panel
    2. Select the option Create New from the menu
  2. Import the files from Zenodo, or from your history, if you completed the previous NAMD simulation tutorial:

    https://zenodo.org/record/2537734/files/cbh1test.dcd
    
    • Copy the link location
    • Open the Galaxy Upload Manager (galaxy-upload on the top-right of the tool panel)

    • Select Paste/Fetch Data
    • Paste the link into the text field

    • Press Start

    • Close the window

    By default, Galaxy uses the URL as the name, so rename the files with a more useful name.

  3. Rename the file ‘CBH1 trajectory’

    tip Tip: Renaming a dataset

    • Click on the galaxy-pencil pencil icon for the dataset to edit its attributes
    • In the central panel, change the Name field
    • Click the Save button

Analysis with BIO3D

We’ll carry out some basic analysis by calculating RMSD, RMSF and PCA. The tools use the Bio3D package, developed by the Grant lab.

RMSD

RMSD, or root-mean-square deviation, is a standard measure of structural distance between coordinates. It measures the average distance between a group of atoms (e.g. backbone atoms of a protein). If we calculate RMSD between two sets of atomic coordinates - for example, two time points from the trajectory - the value is a measure of how much the protein conformation has changed. Wikipedia provides more information.

hands_on Hands-on: Calculate RMSD

RMSD Analysis tool with the following parameters:

  • param-file “dcd trajectory input”: Trajectory file
  • param-file “pdb input”: Structure file
  • “Select domains”: Calpha (calculate RMSD only for the C-alpha domain of the protein)
Snapshot of RMSD plot
Figure 1: RMSD plot for a short CBH1 simulation
Snapshot of RMSD histogram
Figure 2: RMSD histogram for a short CBH1 simulation

question Question

What do the features in the RMSD plot tell us?

solution Solution

The increase in the RMSD plot with time shows the protein steadily deviates from its original conformation.

The three peaks visible in the histogram suggests the presence of three main conformations which are accessed during the trajectory.

RMSF

The root-mean-square fluctuation (RMSF) measures the average deviation of a particle (e.g. a protein residue) over time from a reference position (typically the time-averaged position of the particle). Thus, RMSF analyzes the portions of structure that are fluctuating from their mean structure the most (or least).

hands_on Hands-on: Calculate RMSF

  1. RMSF Analysis tool with the following parameters:
    • param-file “dcd trajectory input”: Trajectory file
    • param-file “pdb input”: Structure file
    • “Select domains”: Calpha (calculate RMSF only for the C-alpha domain of the protein)
Snapshot of RMSF plot
Figure 3: RMSF plot for a short CBH1 simulation

question Question

What can we learn from the features in the RMSF plot?

solution Solution

Higher RMSF values most likely are loop regions with more conformational flexibility, where the structure is not as well defined.

This allows a link with experimental spectroscopic techniques which detect the secondary structure of a protein.

PCA

Principal component analysis (PCA) converts a set of correlated observations (movement of all atoms in protein) to a set of principal components which are linearly independent (or uncorrelated). Mathematically, it is a transformation of the data to a new coordinate system, in which the first coordinate represents the greatest variance, the second coordinate represents the second most variance, and so on.

You can read more about PCA on Wikipedia. In a nutshell, PCA takes a complex dataset with many variables and tries to distill the variables down to a few ‘principal components’ which still preserve most of the differences between the data.

In summary:

  • The PCA tool tool will calculate and return a PCA to determine the relationship between statistically meaningful conformations (major global motions) sampled during the trajectory. THe tool returns several images of the PCA and the raw data in tab-separated format.
  • The PCA visualization tool tool will carry out PCA and return a trajectory of the selected principle component. This trajectory is useful for visualisation and further investigating the interesting modes and changes that occur within a selected principle component.

hands_on Hands-on: Calculate PCA

  1. PCA tool with the following parameters:
    • param-file “dcd trajectory input”: Trajectory file
    • param-file “pdb input”: Structure file
    • “Use singular value decomposition (SVD) instead of default eigenvalue decomposition ?”: No
    • “Select domains”: Calpha
  2. PCA visualization tool with the following parameters:
    • param-file “dcd trajectory input”: Trajectory file
    • param-file “pdb input”: Structure file
    • “Use singular value decomposition (SVD) instead of default eigenvalue decomposition ?”: No
    • “Select domains”: Calpha
    • “Principal component id”: Calpha

PCA visualisation: This tool can generate small trajectories of the first three principal components. The .pdb of the .nc files can be visualized using a visualization software such as VMD.

Snapshot of PCA plot
Figure 4: PCA plot for a short CBH1 simulation

question Question

What do the features in the RMSD plot tell us? Do the principal coordinates have a meaning?

solution Solution

Here, PCA shows the statistically meaningful conformations in the CBH1 trajectory. The principal motions within the trajectory and the vital motions needed for conformational changes can be identified. Two distinct groupings along the PC1 plane, indicating a non-periodic conformational change, are identified. The groupings along the PC2 and PC3 planes do not completely cluster separately, implying that these global motions are periodic. The PC1 is linked to an active site motion that limits the motion to a key glycosidic bond.

Workflow vs. individual tools

You can choose to use the tools one by one as described above, or alternatively combine into a single analysis using the workflow provided.

Snapshot of conformational analysis workflow
Figure 5: A simple analysis workflow

hands_on Upload a workflow

  1. Click on ‘Workflow’ in the toolbar at the top of the main Galaxy page. In the upper right corner of the central pane, click the ‘Upload or import workflow’ icon.

  2. Enter the ‘Archived workflow URL’ and click ‘Import workflow’.

    https://raw.githubusercontent.com/galaxyproject/training-material/master/topics/computational-chemistry/tutorials/analysis-md-simulations/workflows/main_workflow.ga
    

Further analysis

Further analyses are available; try out the MDAnalysis workflow, which includes a Ramachandran plot and various timeseries.

MDAnalysis workflow
Figure 6: MD analysis workflow

Conclusion

keypoints Key points

  • Multiple analyses including timeseries, RMSD, PCA are available

  • Analysis tools allow a further chemical understanding of the system

Useful literature

Further information, including links to documentation and original publications, regarding the tools, analysis techniques and the interpretation of results described in this tutorial can be found here.

congratulations Congratulations on successfully completing this tutorial!