Topics

Allosteric communication

Real-time observation of ligand-induced allosteric transitions

While allostery is of paramount importance for protein regulation, the underlying dynamical process of ligand (un)binding at one site, resulting time evolution of the protein structure, and change of the binding affinity at a remote site is not well understood. Here the ligand-induced conformational transition in a widely studied model system of allostery, the PDZ2 domain, is investigated by transient infrared spectroscopy accompanied by molecular dynamics simulations. To this end, an azobenzene derived photoswitch is linked to a peptide ligand in a way that its binding affinity to the PDZ2 domain changes upon switching, thus initiating an allosteric transition in the PDZ2 domain protein. The subsequent response of the protein, covering four decades of time ranging from ~1ns to ~10μs, can be rationalize by a remodelling of its rugged free energy landscape, with very subtle shifts in the populations of a small number of structurally well defined states. It is proposed that structurally and dynamically driven allostery, often discussed as limiting scenarios of allosteric communication, actually go hand-in-hand, allowing the protein to adapt its free energy landscape to incoming signals.
Real-time observation of ligand-induced allosteric transitions in a PDZ domain, Olga Bozovic, Claudio Zanobini, Adnan Gulzar, Brankica Jankovic, David Buhrke, Matthias Post, Steffen Wolf, Gerhard Stock, Peter Hamm, PNAS 2020 (in press), arXiv:2008.01625

Nonequilibrium MD simulations of a proteins allosteric communication

Allostery represents a fundamental mechanism of biological regulation that is mediated via long-range communication between distant protein sites. Although little is known about the underlying dynamical process, recent time-resolved infrared spectroscopy experiments on a photoswitchable PDZ domain (PDZ2S) have indicated that the allosteric transition occurs on multiple timescales. Here, using extensive nonequilibrium molecular dynamics simulations, a time-dependent picture of the allosteric communication in PDZ2S is developed. The simulations reveal that allostery amounts to the propagation of structural and dynamical changes that are genuinely nonlinear and can occur in a nonlocal fashion. A dynamic network model is constructed that illustrates the hierarchy and exceeding structural heterogeneity of the process. In compelling agreement with experiment, three physically distinct phases of the time evolution are identified, describing elastic response (≲0.1 ns), inelastic reorganization (∼100 ns), and structural relaxation (≳1μs). Issues such as the similarity to downhill folding as well as the interpretation of allosteric pathways are discussed.
Sebastian Buchenberg, Florian Sittel, and Gerhard Stock, PNAS 114, 33, E6804 (2017)

Biomolecular energy transport

Energy transport through molecular systems has recently received considerable interest, in particular, due to its importance for molecular electronics and the functioning of biological systems. In experiment, energy is usually induced into the molecule by photoexcitation of a chromophore or by pumping a localized infrared vibration. This triggers an cascade of dynamical processes including
• the femtosecond vibrational energy redistribution from localized high-frequency modes to delocalized low-frequency modes,
• transport of energy through the molecule, and
• the cooling the molecules due to heat transfer to the solvent.
To study these processes in biomolecules such as peptides, proteins, and RNA, we combine quantum-classical techniques well-known from the description of gas phase reactions with biomolecular force fields typically used in all-atom molecular dynamics simulations. This leads to nonequilibrium molecular dynamics simulations, which mimic the laser excitation of the molecules by nonequilibrium phase-space initial condition for the solute and the solvent atoms. To understand the quantum effects involved, we furthermore perform classical and quantum-mechanical perturbation theory. The results are discussed in the light of recent time-resolved infrared experiments.

## Scaling Rules for Energy Transport

We examine vibrational energy flow in dehydrated and hydrated villin headpiece subdomain HP36 by master equation simulations. Transition rates used in the simulations are obtained from communication maps calculated for HP36. In addition to energy flow along the main chain, we identify pathways for energy transport in HP36 via hydrogen bonding between residues quite far in sequence space. The results of the master equation simulations compare well with all-atom non-equilibrium simulations to about 1 ps following initial excitation of the protein, and quite well at long times, though for some residues we observe deviations between the master equation and all-atom simulations at intermediate times from about 1–10 ps. Those deviations are less noticeable for hydrated than dehydrated HP36 due to energy flow into the water.
Scaling Rules for Vibrational Energy Transport in Globular Proteins S. Buchenberg, D. M. Leitner, and G. Stock, J. Phys. Chem. Lett. 7, 25 (2016)

Dissipation-corrected targeted molecular dynamics

While common molecular dynamics simulation methods are capable to simulate molecular processes on timescales up to milliseconds, the biologically relevant range of timescales reaches from microseconds to minutes. To enforce structural changes on such timescales, biased MD simulation methods have been developed, which aim at predicting free energies along a predefined reaction coordinate. A problem with this type of simulation and analysis is that the resulting free energies only allow limited inference of the unbiased systems's dynamics and nonequilibrium effects such as dissipation, as fast degrees of freedom have been integrated out. To allow such a coarse-graining of system dynamics, we have developed dissipation-corrected targeted molecular dynamics. Combining the Jarzynski equality and the Markovian Langevin equation, we derive an expression for a dissipation correction that can be calculated on-the-fly for nonequilibrium pulling simulations. Our approach does not only result in the free energy profile, but in friction factors, as well, the latter allowing insights into fast dynamics and fluctuations along the biasing coordinate as well as their molecular origin. Furthermore, both quantities can serve as input for integration of a Langevin equation, speeding up calculations of system dynamics by several orders of magnitude. Combination with our approach of temperature boosted Langevin equation simulations finally allows to access timescales of up to minutes.
Multisecond ligand dissociation dynamics from atomistic simulations, S. Wolf, B. Lickert, S. Bray, and G. Stock, Nat. Commun. 11, 2918 (2020)

Learning Collective Variables and Metastable States

While classical molecular dynamics (MD) simulations describe the motion of biomolecular systems in terms of usually thousands of atoms, their essential dynamics takes place on a way smaller, latent space, characterized by only few collective variables (CVs) $$\vec{x}$$. These variables are chosen in such a way, that they encompass the statistical (structural) and dynamical features of these given systems: Biomolecular processes can then be described in terms of the distribution $$P(\vec{x})$$ an the corresponding free energy landscape $$\Delta G(\vec{x}) = - \mathrm{k}_B T \ln P(\vec{x})$$, visualizing the meta-stable states of the systems and the pathways in between these states and giving the bridge to coarse-grained models of these processes, like Langevin or Markov state models. A suitable choice of coordinates $$x$$ is therefore paramount for a precise description of the biomolecular system at hand.
Free energy landscape of HP35 via PCA using (a) Cartesian coordinates, (b) contact distances and (c) backbone dihedral angles (dPCA+), and (d) obtained from TICA, showing a clear structural resolution with dPCA in this case.

Principal component analysis - Cartesian vs. internal coordinates

One widely-used approach to systematically construct a low-dimensional set of CVs is called principal component analysis (PCA). Given some high-dimensional MD input data $$\vec{r}$$, PCA describes the correlated motion of the system via their projection onto the eigenvectors of their covariance matrix $$\sigma_{\vec{r}_i \vec{r}_j}$$ which account for the dynamics along the directions of maximum variance. While the Cartesian coordinates of all atoms in the system might be the straightforward choice for $$\vec{r}$$, for flexible systems as a folding protein, this often leads to poor CVs, since it is not possible to remove their overall rotation due to mixing with the internal motions. Alternatively, internal coordinates, like dihedral angles and contact distances, can be suitable choices as inputs $$\vec{r}$$: The former are valuable conformational descriptors that directly indicate whether the protein forms helices, sheets or loops. Due to their periodic nature, projections in circular space are ambiguous. We therefore proposed to use dPCA+, which exploit the fact that steric hindrances limit the full angular space such that natural cuts between sampled regions can be defined, allowing to perform PCA in a standard manner, without distortion errors and coordinate doubling as in the usual sin/cos-transformed coordinates of regular dPCA. The latter are well-suited to describe large-scale structural changes. Since the use of all possible distances between C$$_\alpha$$-atoms scales quadratically with the number of considered atoms, resulting in a numerically expensive analysis and moreover in highly correlated coordinates, we suggested to focus on distances between interresidue contacts of the protein that are present in the native state of a protein, since they have been shown to largely determine the folding pathways. We consider a contact as formed if the distance between the closest lying heavy atoms of each residue is less than $$0.45 \,$$nm and discard contacts between residues that are less than four residues apart, thereby omitting short-range contacts.
Principal component analysis of molecular dynamics: On the use of Cartesian vs. internal coordinates, F. Sittel, A. Jain and G. Stock, J. Chem. Phys. 141, 014111 (2014)
Principal component analysis on a torus: Theory and application to protein dynamics, F. Sittel, T. Filk, and G. Stock, J. Chem. Phys 147, 244101 (2017)
Contact- and distance-based principal component analysis of protein dynamics M. Ernst, F. Sittel, and G. Stock, J. Chem. Phys. 143, 244114 (2015)

Density based clustering

Coming soon
The scheme of our machine learning algorithm (a) to identify essential internal coordinates shown by their feature importance (b) resulting from an iterative discard of unimportant features (c).

Learning the essential coordinates

Described as combinations of high-dimensional input coordinates, CVs tend to be hard to interpret as they do not necessarily point to the essential internal coordinates, i.e., specific interatomic distances or dihedral angles that are important for the considered process. Information on these coordinates, on the other hand, maybe provided by the metastable conformational states of the system, which yield a well-defined representation of the low free energy regions. Thus, using the data set of molecular coordinates (obtained from experiment or simulation) and an associated set of metastable conformational states (obtained from clustering the data), we can train a supervised machine learning model based on the learner XGBoost to assign unknown molecular structures to the set of metastable states. In this way, the model learns to determine the features of the molecular coordinates that are most important to discriminate the states. Using a new algorithm that exploits this feature importance via an iterative exclusion principle, we identify the essential internal coordinates (such as specific interatomic distances or dihedral angles) of the system, which are shown to represent versatile reaction coordinates that account for the dynamics of the slow degrees of freedom and explain the mechanism of the underlying processes. Moreover, these coordinates give rise to a free energy landscape that may reveal previously hidden intermediate states of the system.
Machine Learning of Biomolecular Reaction Coordinates, S.Brandt, F. Sittel, M. Ernst, and G. Stock, J. Phys. Chem. Lett. 9, 2144 (2018)
Unfolding of the $$\alpha$$-helical state of Ala$$_{10}$$. Though in equilibrium (c) the unfolded state is not sampled, the energy landscapes can still be estimated using non-equilibrium pulling (d,e).

PCA of nonequilibrium MD simulations

While PCA is routinely applied to equilibrium molecular dynamics (MD) simulations, it is less obvious how to extend the approach to nonequilibrium simulation techniques. This includes, e.g., the definition of the statistical averages employed in PCA, as well as the relation between the equilibrium free energy landscape $$\Delta G(\vec{x})$$ and energy landscapes $$\Delta \mathcal{G}(\vec{x})$$ obtained from nonequilibrium MD. As an example for a nonequilibrium method, "targeted MD" is considered which employs a moving distance constraint to enforce rare transitions along some biasing coordinates. The introduced bias can be described by a weighting function $$P(s)$$, which provides a direct relation between equilibrium and nonequilibrium data, and thus establishes a well-defined way to perform PCA on nonequilibrium data. While the resulting distribution $$P(\vec{x})$$ and energy $$\Delta G \propto \ln P$$ will not reflect the equilibrium state of the system, the nonequilibrium energy landscape $$\Delta \mathcal{G}(\vec{x})$$ may directly reveal the molecular reaction mechanism. Although the formulation is in principle exact, its practical use depends critically on the choice of the biasing coordinates, which should account for a naturally occurring motion between two well-defined end-states of the system.
Principal component analysis of nonequilibrium molecular dynamics simulations, M. Post, S. Wolf, and G. Stock, J. Chem. Phys 150, 101063 (2019)

Markov State Modeling of Nonequilibrium Processes

Under construction

Langevin Modeling

A popular approach to analyze complex biomolecular systems is to define a low-dimensional reaction coordinate which aims for an appropriate description of the essential collective dynamics. Assuming a separation of time scales between those coordinates and the neglected orthogonal degrees of freedom, the Markovian Langevin equation represents a powerful framework to investigate and interpret the observed dynamics. We have developed a data-driven Langevin equation (dLE) algorithm to estimate multidimensional Langevin fields from MD data. Only based on local information, the dLE allows for the interpretation of enhanced sampling data to predict long time dynamics like transition times. Based on dLE studies we proposed a practical method to construct an analytically defined global model of structural dynamics. This approach employs an “empirical valence bond”-type model which can cover multidimensional free energy landscapes as well as an approximate description of the friction field estimated by the dLE. Adopting alanine dipeptide and a three-dimensional model of heptaalanine as simple examples, the resulting Langevin model reproduced the results of the underlying all-atom simulations. For further applications, analytically defined models can be used to investigate the dependence of the system on parameter changes and to predict the effect of site-selective mutations on the dynamics.
• Global Langevin model of multidimensional biomolecular dynamics, Norbert Schaudinnus, Benjamin Lickert, Mithun Biswas and Gerhard Stock J. Chem. Phys. 145, 184114 (2016)
• Multidimensional Langevin Modeling of Nonoverdamped Dynamics, Norbert Schaudinnus, Björn Bastian, Rainer Hegger, and Gerhard Stock, Phys. Rev. Lett. 115, 050602 (2015)

Vibrational Signatures of Biomolecular Dynamics

Modern techniques of infrared spectroscopy, in particular multidimensional and time-resolved version of it, have opened the way to monitor secondary structure and dynamics of polypeptides and proteins. In particular the amide I band, reflecting the backbone C=O stretching vibration, has been found to yield structural information. To obtain a first principles description of the amide I vibrational spectrum of peptides in aqueous solution, we pursue a quantum-classical approach which consists of a classical molecular dynamics simulation of the conformational distribution of the system, density functional theory calculations of the conformation-dependent and solvent-induced frequency fluctuations, and a semiclassical description of the vibrational line shapes.

Vibrational Spectroscopic Map, Vibrational Spectroscopy, and Intermolecular Interaction

Real time observation of ultrafast peptide conformational dynamics: MD simulation vs. IR experiment

Employing nonequilibrium molecular dynamics (MD) simulations and transient infrared (IR) spectroscopy, a joint theoretical/experimental study on a water-soluble photoswitchable octapeptide designed by Renner et al. [Biopolymers 63, 382 (2002)] is presented. The simulations predict the cooling of the hot photoproducts on a time scale of 7 ps and complex conformational rearrangements ranging from a few picoseconds to several nanoseconds. The experiments yield a dominant fast relaxation time of 5 ps, which is identified as the cooling time of the peptide in water and also accounts for initial conformational changes of the system. Moreover, a weaker component of 300 ps is found, which reflects the overall conformational relaxation of the system. The virtues and the limitations of the joint MD/IR approach to describe biomolecular conformational rearrangements are discussed.
• Real Time Observation of Ultrafast Peptide Conformational Dynamics: Molecular Dynamics Simulation vs Infrared Experiment, Norbert P. H. Nguyen, H. Staudt, J. Wachtveitl, and G. Stock, J. Phys. Chem. B 115, 13084 (2011)

Simulation of transient infrared spectra of a photoswitchable peptide