2020 Theses Doctoral

# The Critical Assessment of Protein Dynamics using Molecular Dynamics (MD) Simulations and Nuclear Magnetic Resonance (NMR) Spectroscopy Experimentation

The biological functions of proteins often rely on structural changes and the rates at which these conformational changes occur. Studies show that regions of a protein which are known to be involved in enzyme catalysis or in contact with the substrate are identifiable by NMR spectroscopy to be more flexible, evidenced through measuring order parameters of specific bond vectors. While generalized NMR can allow for detailed characterization of the extent and time scales of these conformational fluctuations, NMR cannot easily produce the structures of sparsely populated intermediates nor can it produce explicit complex atomistic-level mechanisms needed for the full understanding of such processes. Practically, preparing a protein with appropriate isotropic enrichment to study a set of specific bond vectors experimentally is challenging as well. Oftentimes, measuring the dynamics of neighboring bond vectors are necessitated.

Detailed studies of the coupling interactions among specific residues and protein regions can be fulfilled by the use of molecular dynamics (MD) simulations. However, MD simulations rely on the ergodic hypothesis to mimic experimental conditions, requiring long simulation times. Simulations are additionally limited by the availability of accurate and reliable molecular mechanics force fields, which continue to be improved to better match experimental data. Much can also be learned from chemical theory and simulations to improve the methods in which experimental data is processed and analyzed.

The overarching goals of this thesis are to improve upon the results generated by existing methods in NMR spin relaxation spectroscopy, whether that be through: (i) improving analytical techniques of raw NMR data or through (ii) supporting experimental results with atomistically-detailed MD simulations. The majority of this work is exemplified through the protein Escherichia coli ribonuclease HI (ecRNH).

Ribonuclease HI (RNase H) is a conserved endonuclease responsible for cleaving the RNA strand of DNA/RNA hybrids in many biological processes, including reverse transcription of the viral genome in retroviral reverse transcriptases and Okazaki fragment processing during DNA replication of the lagging strand. RNase H belongs to a broader superfamily of nucleotidyl-transferases with conserved structure and mechanism, including retroviral integrases, Holliday junction resolvases, and transposases. RNase H has historically been the subject of many investigations in folding, structure, and dynamics.

In support of the first aim, we discuss new methods of obtaining more precise experimental results for order parameters and time constants for the ILV methyl groups. Deuterium relaxation rate constants are determined by the spectral density function for reorientation of the C-D bond vector at zero, single-quantum, and double-quantum 2H frequencies. We interpolate relaxation rates measured at available NMR spectrometer frequencies in order to perform a joint single/double-quantum analysis. This yields approximately 10-15% more precise estimates of model-free parameters and consequently provides a general strategy for further interpolation and extrapolation of data gathered from existing NMR spectrometers for analysis of 2H spin relaxation data in biological macromolecules.

In support of the second aim, we calculate autocorrelation functions and generalized order parameters for the ILV methyl side chain groups from MD simulation trajectories to assess the orientational motions of the side chain bond vectors. We demonstrate that motions of the side chain bond vectors can be separated into: (i) fluctuations within a given dihedral angle rotamer, (ii) jumps among the different rotamers, and (iii) motions from the protein backbone itself, through the C-alpha carbon. We are able to match order parameters of constitutive motions to conventionally calculated order parameters with an R2= 0.9962, 0.9708, and 0.9905 for Valine, Leucine, and Isoleucine residues, respectively. Some longer side chain residues such as Leucine and Isoleucine have correlated χ1 and χ2 dihedral angle rotational motions. This provides a method of evaluating the relative contributions of each constitutive motion towards the overall flexibility of a side chain. Multiple contributors of motion are possible for intermediate and low order parameters, signifying more flexible residues.

While developing protocols for MD simulations, we evaluate the effects of running 1-microsecond long simulations and compare them to solution state NMR spectroscopy. If the overall tumbling time is removed from the simulation, then analysis blocks of 5-10 times the tumbling time is optimal to eliminate contributions from slower dynamics, which would not normally be measured in solution state NMR spectroscopy. We also assess the quality of the TIP4P(-EW) water model over TIP3P; although TIP4P simulates the isotropic tumbling time well for ecRNH, internal motions are equally not affected by either water model due to well-segregated motions. Additionally, the TIP4P water model does not appear to be able to replicate an axially symmetric shape for ecRNH (ecRNH is mostly spherical and only slightly axially symmetric).

The final work of this thesis returns to the first overarching aim; we develop a specialized method that utilizes probability distribution functions to model spectral density functions. We derive the inverse Gaussian probability distribution function from general properties of spectral density functions at low and high frequencies for macromolecules in solution, using the principle of maximum entropy. The resulting model-free spectral density functions are finite at a frequency of zero and can be used to describe distributions of either overall or internal correlation times using the model-free ansatz. The approach is validated using 15N backbone relaxation data for the intrinsically disordered, DNA-binding region of the bZip transcription factor domain of the Saccharomyces cerevisiae protein GCN4, in the absence of cognate DNA.

## Subjects

## Files

*This item is currently under embargo. It will be available starting 2021-12-04.*

## More About This Work

- Academic Units
- Chemistry
- Thesis Advisors
- Friesner, Richard A.
- Palmer III, Arthur G.
- Degree
- Ph.D., Columbia University
- Published Here
- January 17, 2020