Academic Commons

Theses Doctoral

Generative Models for Synthetic Biology

Blazejewski, Tomasz

Over the past several years, the fields of synthetic biology and machine learning have demonstrated marked advances in the scale of their capabilities and the success of their applications. The work presented in this thesis focuses on the translation of recent advances in machine learning toward new applications in synthetic biology. In particular it is argued that the needs of synthetic biology researchers and practitioners are well met by a class of generative machine learning models, and that the scale of synthetic biology capabilities allows for their successful application across multiple domains of interest.

In Chapter 1, a novel algorithm utilizing Markov Random Fields is used to, for the first time, design functional synthetic overlapping pairs of genes with potential applications for improved biological robustness and biosafety. In Chapter 2, motivated by a desire to extend the scope of protein sequence modeling to a greater range and diversity of protein sequences, a variant of a variational autoencoder model is used to project hundreds of millions of protein sequences into a continuous latent space with potentially useful representation features. Finally, in Chapter 3, we move beyond the realm of protein sequences to define a probabilistic species-specific model of regulatory sequences and explore this model’s utility for the challenging task of gene expression prediction for non-model bacterial organisms.

Machine learning models presented in this thesis represent novel applications of models traditionally applied to data in the domains of images, text or sound toward addressing challenging problems in biology. Particular attention is devoted to the challenging task of utilizing large amounts of unlabeled data present in metagenomic sequences and the genomes of poorly characterized bacteria in the hope of improving researchers’ abilities to manipulate complex biological phenomena.


This item is currently under embargo. It will be available starting 2022-10-02.

More About This Work

Academic Units
Cellular, Molecular and Biomedical Studies
Thesis Advisors
Wang, Harris H.
Ph.D., Columbia University
Published Here
October 5, 2020