Theses Doctoral

Accurate and Scalable Electronic Structure Methods and Machine Learning Potentials for Molecules and Materials Simulation

Wei, Yujing

Wavefunction methods directly address the Schrödinger equation HΨ = EΨ, byapproximating the many-electron wavefunction Ψ. These methods are directly ab initio and hence accurate and systematically improvable.

In Chapter 1, we evaluate the precision of widely recognized quantum chemical methodologies, CCSD(T), DLPNO-CCSD(T) and localized ph-AFQMC, for determining the thermochemistry of main group elements. DLPNO-CCSD(T) and localized ph-AFQMC, which offer greater scalability compared to canonical CCSD(T), have emerged over the last decade as pivotal in producing precise benchmark chemical data. Our investigation includes closed-shell, neutral molecules, focusing on their heat of formation and atomization energy sourced from four specific small molecule datasets. Firstly, we selected molecules from the G2 and G3 datasets, noted for their reliable experimental heat of formation data.

Additionally, we incorporate molecules from the W4-11 and W4-17 sets, which provide high-level theoretical reference values for atomization energy at 0 K. Our findings reveal that both DLPNO-CCSD(T) and ph-AFQMC methods are capable of achieving a root-mean-square deviation (RMSD) of less than 1 kcal/mol across the combined dataset, aligning with the threshold for chemical accuracy. Moreover, we make efforts to confine the maximum deviations within 2 kcal/mol, a degree of precision that significantly broadens the applicability of these methods in fields such as biology and materials science. Wavefunction methods, though generally computationally more expensive than alternatives such as DFT and force fields, are still widely used for their accuracy and reliability. Some examples of applications are given in Chapter 2.

Another approach to ab initio quantum chemistry is Density Functional Theory (DFT). Instead of the many-electron wavefunction, which may require a large number of Slater determinants to accurately represent, DFT focuses on the electron density, a three-dimensional quantity. Therefore, DFT is less computationally demanding than high-level wavefunction methods, and has become the workhorse of quantum chemistry. However, the exact form of the exchange-correlation functional (in the Kohn Sham formalism) is elusive, and this has led to various approximations of this functional to varying levels. The standard DFT functional approximation is typically derived from parametrization or simple physical models, and is not guaranteed to be transferable across a wide range of systems.

In Chapter 3, we explore the integration of Localization (LOC) methods with DFT to improve its predictive capabilities for a moderate-cost functional and basis set combination, especially for systems with complex electronic structures. Here, we demonstrate our development of an automated DFT-LOC protocol. Furthermore, we benchmark DFT-LOC against a selection of popular functionals from different rungs on the Jacob’s ladder and two basis sets, one triple-zeta (TZ) and one quadruple-zeta (QZ), for heats of formation and atomization energy of small molecule datasets G2, G3, W4-11, and W4-17. B3LYP-LOC with a TZ basis set is able to reduce the RMSD of all datasets to under 2 kcal/mol, which is consistently a significant improvement over the best performing functional and basis set combination for each dataset. These results highlight the accuracy, efficiency and transferability of the DFT-LOC scheme. Once a DFT functional has been benchmarked against accurate experimental values or wavefunction theory, it can be selected to run simulations at a larger scale. DFT still exhibits a computational scaling with respect to system size of approximately N³ (depending on functional and implementation). Machine Learning Force Fields (MLFFs), also known as Machine Learning Interatomic Potentials (MLIPs), are rapidly growing approaches that bridge ab initio accuracy with the scalability of empirical methods such as classical force fields. These methods fit a potential energy surface calculated using an accurate method such as DFT. While typically more computationally expensive than classical force fields due to a more complex functional form, MLFFs open up more avenues for simulating difficult systems at a high accuracy, especially those involving events such as reactions and charge transfer. In Chapter 4, we introduce MPNICE, an invariant MLFF with an iterative charge equilibration scheme, that can accurately model different oxidation states.

One process which has gained traction for MLFF application is the critical yet still poorly understood formation of the solid electrolyte interphase (SEI) at the anode of a Li-ion battery (LIB) during the first charge cycle, where electrochemical reduction of the electrolyte leads to the generation of decomposition products. MLFFs are uniquely poised to atomistically describe these electrochemical processes, as they are not as affected by the same limitations in bonding and electron transfer as classical force fields. Nonetheless, training MLFFs to run accurate dynamics of a condensed phase system with two different and dynamic oxidation states, such as in electrochemistry, is challenging for many architectures. In Chapter 5, we show that by using MPNICE, we are able to accurately (within 1 kcal/mol) train models along multiple potential energy surfaces for LIB-relevant electrolyte systems. Simulations using these models reveal new insights into electrolyte reduction and considerations for the realistic simulation of charge transfer processes in the condensed phase.

Files

This item is currently under embargo. It will be available starting 2027-07-30.

More About This Work

Academic Units
Chemical Physics
Thesis Advisors
Friesner, Richard A.
Degree
Ph.D., Columbia University
Published Here
September 3, 2025