Theses Doctoral

Abstractions for Probabilistic Programming to Support Model Development

Bernstein, Ryan

Probabilistic programming is a recent advancement in probabilistic modeling whereby we can express a model as a program with little concern for the details of probabilistic inference.

Probabilistic programming thereby provides a clean and powerful abstraction to its users, letting even non-experts develop clear and concise models that can leverage state-of-the-art computational inference algorithms. This model-as-program representation also presents a unique opportunity: we can apply methods from the study of programming languages directly onto probabilistic models. By developing techniques to analyze, transform, or extend the capabilities of probabilistic programs, we can immediately improve the workflow of probabilistic modeling and benefit all of its applications throughout science and industry.

The aim of this dissertation is to support an ideal probabilistic modeling workflow byaddressing two limitations of probabilistic programming: that a program can only represent one model; and that the structure of the model that it represents is often opaque to users and to the compiler. In particular, I make the following primary contributions:

(1) I introduce Multi-Model Probabilistic Programming: an extension of probabilistic programming whereby a program can represent a network of interrelated models. This new representation allows users to construct and leverage spaces of models in the same way that probabilistic programs do for individual models. Multi-Model Probabilistic Programming lets us visualize and navigate solution spaces, track and document model development paths, and audit modeler degrees of freedom to mitigate issues like p-hacking. It also provides an efficient computational foundation for the automation of model-space applications like model search, sensitivity analysis, and ensemble methods.

I give a formal language specification and semantics for Multi-Model Probabilistic Programming built on the Stan language, I provide algorithms for the fundamental model-space operations along with proofs of correctness and efficiency, and I present a prototype implementation, with which I demonstrate a variety of practical applications.

(2) I present a method for automatically transforming probabilistic programs into semantically related forms by using static analysis and constraint solving to recover the structure of their underlying models. In particular, I automate two general model transformations that are required for diagnostic checks which are important steps of a model-building workflow. Automating these transformations frees the user from manually rewriting their models, thereby avoiding potential correctness and efficiency issues.

(3) I present a probabilistic program analysis tool, “Pedantic Mode”, that automatically warns users about potential statistical issues with the model described by their program. “Pedantic Mode” uses specialized static analysis methods to decompose the structure of the underlying model. Lastly, I discuss future work in these areas, such as advanced model-space algorithms and other general-purpose model transformations. I also discuss how these ideas may fit into future modeling workflows as technologies.


  • thumnail for Bernstein_columbia_0054D_17851.pdf Bernstein_columbia_0054D_17851.pdf application/pdf 3.65 MB Download File

More About This Work

Academic Units
Computer Science
Thesis Advisors
Wing, Jeannette M.
Gelman, Andrew
Ph.D., Columbia University
Published Here
June 7, 2023