Theses Doctoral

Automated Model Discovery & Explanation Generation for Physicochemical Systems using Artificial Intelligence

Chakraborty, Arijit

The advent of powerful computational resources coupled with substantial progress in algorithms and increased dataset size, have led to the development of machine learning (ML) models for physicochemical systems. Such models often come at the cost of interpretability and lack of explainability by a domain expert, thereby limiting its usage. Unlike applications such as game playing, recommendation systems, or even chatbots which have permeated everyday life in recent times, science and engineering systems are steeped in first-principles knowledge that must be leveraged to render meaningful explanations of the mathematical relations that attempt to model complex physicochemical systems. Accordingly, there is a need to develop models that can be used to subsequently explain salient aspects of the physicochemical phenomena. In this work, an end-to end data-driven model discovery engine and explanation generation artificial intelligence (AI) system is developed, which is tested on two real-world case studies of varying scales.

Chapter 1 introduces the need for combining first-principles knowledge into the modeling workflow in order to obtain more meaningful results. As shall be explained, the cost of a mistake in the sciences and engineering disciplines may prove to be fatal, and thus it is imperative that the models generated be explainable.

Chapter 2 outlines a data-driven symbolic model discovery engine that outputs an ensemble of best performing models when provided data. This system relies on the a priori inclusion of first-principles knowledge of the system being modeled, resulting in meaningful functional transformations.

Chapter 3 expands on the algorithm presented in the preceding chapter, to model systems of ordinary and partial differential equations, and presents the efficacy of the approach across a wide variety of case studies.

Chapter 4 applies the modeling engine developed in the preceding chapters to a bubble column aeration problem with real-world data. The resulting models obtained are compared to the analytical ground-truth, such that the improvement over the analytical model can be captured clearly—something that would not have been possible as effectively using a black-box modeling approach.

Chapter 5 applies the modeling engine to a structure-to-property prediction problem concerning the isothermal adsorption capacity of three adsorbates on zeolite structures. Varying in scale compared to the bubble column aeration system, valuable insights about the most descriptive structural properties were gained by virtue of the interpretable modeling engine.

Having successfully developed interpretable ML models for the physicochemical systems in the preceding chapters, Chapter 6 outlines the development of a large knowledge model (LKM) for automatically generating explanations from ML models. This first requires the extraction and organization of domain-knowledge from textual sources, followed by a hierarchical explanation generation strategy that yields increasingly natural language explanations. This blend of combining domain-knowledge with the natural language capabilities of modern tools such as large language models allows for explanation generation of ML models describing complex physicochemical systems. Finally, the dissertation concludes by summarizing the work undertaken, and outlines potential future directions of research.

Files

  • thumbnail for Chakraborty_columbia_0054D_19574.pdf Chakraborty_columbia_0054D_19574.pdf application/pdf 3.65 MB Download File

More About This Work

Academic Units
Chemical Engineering
Thesis Advisors
Venkatasubramanian, Venkat
Degree
Ph.D., Columbia University
Published Here
November 5, 2025