2022 Theses Doctoral
Analysis of Conformational Continuum and Free-energy Landscapes from Manifold Embedding of Single-particle Cryo-EM Ensembles of Biomolecules
Biological molecules, or molecular machines, visit a continuum of conformational states as they go through work cycles required for their metabolic functions. Single-molecule cryo-EM of suitable in vitro systems affords the ability to collect a large ensemble of projections depicting the continuum of structures. This information, however, comes buried among typically hundreds of thousands of unorganized images formed under extremely noisy conditions and microscopy aberrations. Through the use of machine-learning algorithms, it is possible to determine a low-dimensional conformational spectrum from such data, with leading coordinates of the embedding corresponding to each of the system’s degrees of freedom.
By determining occupancies—or free energies—of the observed states, a free-energy landscape is formed, providing a complete mapping of a system’s configurations in state space while articulating its energetics topographically in the form of sprawling hills and valleys. Within this mapping, a minimum-energy path can be derived representing the most probable sequence of transitions taken by the machine between any two states in the landscape. Along this path, an accompanying sequence of 3D structures may be extracted for biophysical analysis, allowing the basis for molecular function to be elucidated. The ability to determine energy landscapes and minimum-energy paths experimentally from ensemble data opens a new horizon in structural biology and, by extension, molecular medicine.
The present work is based on a geometric machine-learning approach using manifold embedding to obtain this desired information, which has been shown possible on two experimental systems—the 80S ribosome and ryanodine receptor—through a previously-established framework termed ManifoldEM. First, this framework is incorporated into an advanced graphic user interface for public release, and augmented with a new method, POLARIS, for determining minimum-energy pathways. ManifoldEM is next applied on two new systems: vacuolar ATPase and the SARS-CoV-2 spike protein, and for both systems, several novel aspects of the machine’s function are observed.
During this exposition, critical limitations and uncertainties of the framework are also presented, as have been found throughout its extended development and use. However, in the absence of ground-truth data, testing and validation of ManifoldEM is infeasible. As recourse, a protocol is next proposed for generating simulated cryo-EM data from an atomic model subjected to multiple conformational changes and experimental conditions, with several Hsp90 synthetic ensembles generated for analysis by ManifoldEM. Guided by results of these ground-truth studies, new insights are made into the origin of longstanding ManifoldEM problems, further motivating and informing the development of a new, comprehensive method for correcting them, termed ESPER. The ESPER method operates within the ManifoldEM framework and, as will be shown using both synthetic and experimentally-obtained data, ultimately results in substantial improvements to the previous work. Finally, numerous recommendations are laid out for guiding future work on the ManifoldEM suite, particularly aimed at its next public release.
- Seitz_columbia_0054D_17043.pdf application/pdf 8.17 MB Download File
- Supplementary.zip application/zip 423 MB Download File
More About This Work
- Academic Units
- Biological Sciences
- Thesis Advisors
- Frank, Joachim
- Ph.D., Columbia University
- Published Here
- February 2, 2022