Topics

Allosteric communication

Real-time observation of ligand-induced allosteric transitions

While allostery is of paramount importance for protein regulation, the
underlying dynamical process of ligand (un)binding at one site, resulting time
evolution of the protein structure, and change of the binding affinity at a
remote site is not well understood. Here the ligand-induced conformational
transition in a widely studied model system of allostery, the PDZ2 domain, is
investigated by transient infrared spectroscopy accompanied by molecular
dynamics simulations. To this end, an azobenzene derived photoswitch is linked
to a peptide ligand in a way that its binding affinity to the PDZ2 domain
changes upon switching, thus initiating an allosteric transition in the PDZ2
domain protein. The subsequent response of the protein, covering four decades
of time ranging from ~1ns to ~10μs, can be rationalize by a
remodelling of its rugged free energy landscape, with very subtle shifts in the
populations of a small number of structurally well defined states. It is
proposed that structurally and dynamically driven allostery, often discussed as
limiting scenarios of allosteric communication, actually go hand-in-hand,
allowing the protein to adapt its free energy landscape to incoming signals.

•
Real-time observation of ligand-induced allosteric transitions in a PDZ domain,
PNAS 2020 (in press), arXiv:2008.01625

Nonequilibrium MD simulations of a proteins allosteric communication

Allostery represents a fundamental mechanism of biological regulation that is mediated via long-range communication between distant protein sites. Although little is known about the underlying dynamical process, recent time-resolved infrared spectroscopy experiments on a photoswitchable PDZ domain (PDZ2S) have indicated that the allosteric transition occurs on multiple timescales. Here, using extensive nonequilibrium molecular dynamics simulations, a time-dependent picture of the allosteric communication in PDZ2S is developed. The simulations reveal that allostery amounts to the propagation of structural and dynamical changes that are genuinely nonlinear and can occur in a nonlocal fashion. A dynamic network model is constructed that illustrates the hierarchy and exceeding structural heterogeneity of the process. In compelling agreement with experiment, three physically distinct phases of the time evolution are identified, describing elastic response (≲0.1 ns), inelastic reorganization (∼100 ns), and structural relaxation (≳1μs). Issues such as the similarity to downhill folding as well as the interpretation of allosteric pathways are discussed.

•
Time-resolved observation of protein allosteric communication,
PNAS 114, 33, E6804 (2017)

Biomolecular energy transport

Energy transport through molecular systems has recently received considerable interest,
in particular, due to its importance for molecular electronics and the functioning of
biological systems. In experiment, energy is usually induced into the molecule by
photoexcitation of a chromophore or by pumping a localized infrared vibration.
This triggers an cascade of dynamical processes including

- the femtosecond vibrational energy redistribution from localized high-frequency modes to delocalized low-frequency modes,
- transport of energy through the molecule, and
- the cooling the molecules due to heat transfer to the solvent.

We examine vibrational energy flow in dehydrated and hydrated villin headpiece subdomain HP36 by
master equation simulations. Transition rates used in the simulations are obtained from communication
maps calculated for HP36. In addition to energy flow along the main chain, we identify pathways
for energy transport in HP36 via hydrogen bonding between residues quite far in sequence space. The
results of the master equation simulations compare well with all-atom non-equilibrium simulations
to about 1 ps following initial excitation of the protein, and quite well at long times, though for some
residues we observe deviations between the master equation and all-atom simulations at intermediate
times from about 1–10 ps. Those deviations are less noticeable for hydrated than dehydrated HP36
due to energy flow into the water.

•
Scaling Rules for Vibrational Energy Transport in Globular Proteins
J. Phys. Chem. Lett. 7, 25 (2016)

Dissipation-corrected targeted molecular dynamics

While common molecular dynamics simulation methods are capable to simulate molecular processes on timescales up to milliseconds,
the biologically relevant range of timescales reaches from microseconds to minutes.
To enforce structural changes on such timescales, biased MD simulation methods have been developed,
which aim at predicting free energies along a predefined reaction coordinate.
A problem with this type of simulation and analysis is that the resulting free energies only allow limited
inference of the unbiased systems's dynamics and nonequilibrium effects such as dissipation,
as fast degrees of freedom have been integrated out.
To allow such a coarse-graining of system dynamics, we have developed dissipation-corrected targeted molecular dynamics.
Combining the Jarzynski equality and the Markovian Langevin equation, we derive an expression for a dissipation correction
that can be calculated on-the-fly for nonequilibrium pulling simulations.
Our approach does not only result in the free energy profile, but in friction factors, as well,
the latter allowing insights into fast dynamics and fluctuations along the biasing coordinate as well as their molecular origin.
Furthermore, both quantities can serve as input for integration of a Langevin equation,
speeding up calculations of system dynamics by several orders of magnitude.
Combination with our approach of temperature boosted Langevin equation simulations finally allows to access timescales of up to minutes.

•
Multisecond ligand dissociation dynamics from atomistic simulations,
Nat. Commun. 11, 2918 (2020)

Learning Collective Variables and Metastable States

While classical molecular dynamics (MD) simulations describe the motion of biomolecular systems
in terms of usually thousands of atoms, their essential dynamics takes place on a way smaller,
latent space, characterized by only few collective variables (CVs) \( \vec{x} \).
These variables are chosen in such a way, that they encompass the statistical (structural)
and dynamical features of these given systems: Biomolecular processes can then be described in
terms of the distribution \( P(\vec{x}) \) an the corresponding free energy landscape
\( \Delta G(\vec{x}) = - \mathrm{k}_B T \ln P(\vec{x}) \),
visualizing the meta-stable states of the systems and the pathways in between these states and giving the
bridge to coarse-grained models of these processes, like Langevin or Markov state models.
A suitable choice of coordinates \( x \) is therefore paramount for a precise description of the biomolecular system at hand.

• Perspective: Identification of collective variables and metastable states of protein dynamics,
J. Chem. Phys 149, 150901 (2018)

Free energy landscape of HP35 via PCA using (a) Cartesian coordinates, (b) contact distances and (c) backbone dihedral angles (dPCA+), and (d) obtained from TICA, showing a clear structural resolution with dPCA in this case.

Principal component analysis - Cartesian vs. internal coordinates

One widely-used approach to systematically construct a low-dimensional set of CVs is called principal component analysis (PCA).
Given some high-dimensional MD input data \( \vec{r} \), PCA describes the correlated motion of the system
via their projection onto the eigenvectors of their covariance matrix \( \sigma_{\vec{r}_i \vec{r}_j} \)
which account for the dynamics along the directions of maximum variance.
While the Cartesian coordinates of all atoms in the system might be the straightforward choice for \( \vec{r} \),
for flexible systems as a folding protein, this often leads to poor CVs,
since it is not possible to remove their overall rotation due to mixing with the internal motions.
Alternatively, internal coordinates, like dihedral angles and contact distances, can be suitable choices as inputs \( \vec{r} \):
The former are valuable conformational descriptors that directly indicate whether the protein forms helices,
sheets or loops. Due to their periodic nature, projections in circular space are ambiguous. We therefore proposed to use dPCA+,
which exploit the fact that steric hindrances limit the full angular space such that natural
cuts between sampled regions can be defined, allowing to perform PCA in a standard manner,
without distortion errors and coordinate doubling as in the usual sin/cos-transformed coordinates of regular dPCA.
The latter are well-suited to describe large-scale structural changes.
Since the use of all possible distances between C\(_\alpha\)-atoms scales quadratically with the number of considered atoms,
resulting in a numerically expensive analysis and moreover in highly correlated coordinates,
we suggested to focus on distances between interresidue contacts of the protein that are present in the native state of a protein,
since they have been shown to largely determine the folding pathways.
We consider a contact as formed if the distance between the closest lying heavy atoms of each residue is less than
\(0.45 \,\)nm and discard contacts between residues that are less than four residues apart, thereby omitting short-range contacts.

•
Principal component analysis of molecular dynamics: On the use of Cartesian vs. internal coordinates,
J. Chem. Phys. 141, 014111 (2014)

•
Principal component analysis on a torus: Theory and application to protein dynamics,
J. Chem. Phys 147, 244101 (2017)

•
Contact- and distance-based principal component analysis of protein dynamics
J. Chem. Phys. 143, 244114 (2015)

Density based clustering

Coming soon

The scheme of our machine learning algorithm (a) to identify essential internal coordinates shown by their feature importance (b) resulting from an iterative discard of unimportant features (c).

Learning the essential coordinates

Described as combinations of high-dimensional input coordinates, CVs tend to be hard to
interpret as they do not necessarily point to the essential internal coordinates, i.e.,
specific interatomic distances or dihedral angles that are important for the considered process.
Information on these coordinates, on the other hand, maybe provided by the metastable conformational
states of the system, which yield a well-defined representation of the low free energy regions.
Thus, using the data set of molecular coordinates (obtained from experiment or simulation) and an
associated set of metastable conformational states (obtained from clustering the data),
we can train a supervised machine learning model based on the learner XGBoost to assign
unknown molecular structures to the set of metastable states.
In this way, the model learns to determine the features of the molecular coordinates that
are most important to discriminate the states.
Using a new algorithm that exploits this feature importance via an iterative exclusion principle,
we identify the essential internal coordinates (such as specific interatomic distances or dihedral angles)
of the system, which are shown to represent versatile reaction coordinates that account
for the dynamics of the slow degrees of freedom and explain the mechanism of the underlying processes.
Moreover, these coordinates give rise to a free energy landscape that may reveal
previously hidden intermediate states of the system.

•
Machine Learning of Biomolecular Reaction Coordinates,
J. Phys. Chem. Lett. 9, 2144 (2018)

Unfolding of the \(\alpha\)-helical state of Ala\(_{10}\). Though in equilibrium (c) the unfolded state is not sampled, the energy landscapes can still be estimated using non-equilibrium pulling (d,e).

PCA of nonequilibrium MD simulations

While PCA is routinely applied to equilibrium molecular dynamics (MD) simulations,
it is less obvious how to extend the approach to nonequilibrium simulation techniques.
This includes, e.g., the definition of the statistical averages employed in PCA,
as well as the relation between the equilibrium free energy landscape
\( \Delta G(\vec{x}) \) and energy landscapes \( \Delta \mathcal{G}(\vec{x}) \)
obtained from nonequilibrium MD.
As an example for a nonequilibrium method, "targeted MD" is considered which employs a
moving distance constraint to enforce rare transitions along some biasing coordinates.
The introduced bias can be described by a weighting function \(P(s) \),
which provides a direct relation between equilibrium and nonequilibrium data,
and thus establishes a well-defined way to perform PCA on nonequilibrium data.
While the resulting distribution \( P(\vec{x}) \) and energy \( \Delta G \propto \ln P \)
will not reflect the equilibrium state of the system, the nonequilibrium energy landscape
\(\Delta \mathcal{G}(\vec{x}) \) may directly reveal the molecular reaction mechanism.
Although the formulation is in principle exact, its practical use depends critically on
the choice of the biasing coordinates, which should account for a naturally occurring
motion between two well-defined end-states of the system.

•
Principal component analysis of nonequilibrium molecular dynamics simulations,
J. Chem. Phys 150, 101063 (2019)

Markov State Modeling of Nonequilibrium Processes

Under construction

Langevin Modeling

A popular approach to analyze complex biomolecular systems is to define a low-dimensional
reaction coordinate which aims for an appropriate description of the essential collective
dynamics. Assuming a separation of time scales between those coordinates and the neglected
orthogonal degrees of freedom, the Markovian Langevin equation represents a powerful
framework to investigate and interpret the observed dynamics. We have developed a data-driven
Langevin equation (dLE) algorithm to estimate multidimensional
Langevin fields from MD data. Only based on local information, the dLE allows for the
interpretation of enhanced sampling data to predict long time dynamics like transition
times.
Based on dLE studies we proposed a practical method to construct an analytically defined
global model of structural dynamics. This approach employs an “empirical valence
bond”-type model which can cover multidimensional free energy landscapes as well as
an approximate description of the friction field estimated by the dLE. Adopting alanine
dipeptide and a three-dimensional model of heptaalanine as simple examples, the resulting
Langevin model reproduced the results of the underlying all-atom simulations.
For further applications, analytically defined models can be used to investigate the
dependence of the system on parameter changes and to predict the effect of site-selective
mutations on the dynamics.

• Global Langevin model of multidimensional biomolecular dynamics,
J. Chem. Phys. 145, 184114 (2016)

• Multidimensional Langevin Modeling of Nonoverdamped Dynamics,
Phys. Rev. Lett. 115,
050602 (2015)

Vibrational Signatures of Biomolecular Dynamics

Modern techniques of infrared spectroscopy, in particular multidimensional and time-resolved version of it,
have opened the way to monitor secondary structure and dynamics of polypeptides and proteins.
In particular the amide I band, reflecting the backbone C=O stretching vibration,
has been found to yield structural information.
To obtain a first principles description of the amide I vibrational spectrum of peptides in aqueous solution,
we pursue a quantum-classical approach which consists of a classical molecular dynamics simulation of
the conformational distribution of the system, density functional theory calculations of the conformation-dependent
and solvent-induced frequency fluctuations, and a semiclassical description of the vibrational line shapes.

Vibrational Spectroscopic Map, Vibrational Spectroscopy, and Intermolecular Interaction

Real time observation of ultrafast peptide conformational dynamics: MD simulation vs. IR experiment

Employing nonequilibrium molecular dynamics (MD) simulations and transient infrared (IR) spectroscopy,
a joint theoretical/experimental study on a water-soluble photoswitchable octapeptide designed by
Renner et al. [Biopolymers 63, 382 (2002)] is presented.
The simulations predict the cooling of the hot photoproducts on a time scale of 7 ps and complex conformational
rearrangements ranging from a few picoseconds to several nanoseconds.
The experiments yield a dominant fast relaxation time of 5 ps, which is identified as the cooling time
of the peptide in water and also accounts for initial conformational changes of the system.
Moreover, a weaker component of 300 ps is found, which reflects the overall conformational relaxation of the system.
The virtues and the limitations of the joint MD/IR approach to describe biomolecular conformational rearrangements are discussed.

• Real Time Observation of Ultrafast Peptide Conformational Dynamics: Molecular Dynamics Simulation vs Infrared Experiment,
J. Phys. Chem. B 115, 13084 (2011)

Simulation of transient infrared spectra of a photoswitchable peptide

Nonadiabatic Dynamics

Ultrafast nonadiabatic photoreactions

Maximum Caliber

We show how to apply a general theoretical approach to nonequilibrium statistical mechanics, called Maximum Caliber,
originally suggested by E. T. Jaynes 30 years ago, to a problem of two-state dynamics. Maximum Caliber is a variational
principle for dynamics in the same spirit that Maximum Entropy is a variational principle for equilibrium statistical
mechanics. The central idea is to compute a dynamical partition function, a sum of weights over all microscopic paths,
rather than over microstates. We illustrate the method on the simple problem of two-state dynamics, A ↔ B, first
for a single particle, then for M articles. Maximum Caliber gives a unified framework for deriving all the relevant
dynamical properties, including the microtrajectories and all the moments of the time-dependent probability density.
While it can readily be used to derive the traditional master equation and the Langevin results, it goes beyond them in
also giving trajectory information. For example, we derive the Langevin noise distribution, rather than assuming it.
As a general approach to solving nonequilibrium statistical mechanics dynamical problems, Maximum Caliber has some advantages:

- It is partition-function-based, so we can draw insights from similarities to equilibrium statistical mechanics,
- it is trajectory-based, so it gives more dynamical information than population-based approaches like master equations; this is particularly important for few-particle and single-molecule systems,
- it gives an unambiguous way to relate flows to forces, which has traditionally posed challenges, and
- like Maximum Entropy, it may be useful for data analysis, specifically for time-dependent phenomena.

Graphical representation of the dynamical partition function of a two-state system, showing all possible trajectories for N=3 time steps for a system stating in A.

•
Maximum Caliber: A variational approach applied to two-state dynamics,
J. Chem. Phys. 128, 194102-194102-12 (2008)

•
Maximum caliber inference of nonequilibrium processes,
J. Chem. Phys. 133, 034119 (2010)

•
Inferring Transition Rates of Networks from Populations in Continuous-Time Markov Processes,
J. Chem. Theory Comput. 11, 5464 (2015)