Theses Doctoral

Linking phylogenetic models to population processes, from species trees to genomes

McKenzie, Patrick Franklin

Phylogenetics is transitioning from a history of deep-time analyses with few genes to a future of full-genome data that allows species-level resolutions at deep and shallow time scales. Accompanying this transition is a new focus on demographic parameters like ancestral population sizes and gene flow events in addition to the bifurcating trees that are the cornerstone of the field. As access to more data has highlighted some shortcomings of traditional phylogenetic methods that do not account for the processes of recombination, selection, population size changes, and inter-species gene flow, the field is exploring new theory and methods to catch up with the data.

My thesis focuses on signals of demographic processes in genomic data. In exploring these processes, we attempt to avoid biases involved in simply extending old phylogenetic methods -- which have typically been applied to just a handful of genes -- to genomic datasets.

Chapter 1 introduces a tool, ipcoal, for simulating genomic data on phylogenetic trees within a framework that includes recombination and the ability to specify effective population sizes, gene flow events, recombination maps, and differences in generation times. This tool enables, to varying degrees, all further chapters.

Chapter 2 studies the effects of species tree demographic parameters on the resulting linkage among nearby local genealogies, including implications for gene tree and species tree inference.

Chapter 3 examines turnover in local histories along the genome using a theoretical framework, the MS-SMC, which links topological heterogeneity along the genome to species tree model.

Chapter 4 introduces simcat, a machine-learning method that uses genome-wide SNP data to infer admixture events on a phylogeny without relying on gene tree inference. This is an important step toward decreasing gene tree estimation error over deep evolutionary time scales. Behind the scenes, simcat uses ipcoal to train a machine learning model to map patterns in SNP data to the demographic scenarios that produced them.

These chapters demonstrate new phylogenetic theory and methods for refining our ability to infer historical processes at phylogenetic scales, while also illuminating the importance of population-scale processes like gene flow and recombination for shaping genomes sampled in the present day.


  • thumnail for McKenzie_columbia_0054D_17891.pdf McKenzie_columbia_0054D_17891.pdf application/pdf 3.47 MB Download File

More About This Work

Academic Units
Ecology, Evolution, and Environmental Biology
Thesis Advisors
Eaton, Deren A. R.
Ph.D., Columbia University
Published Here
July 19, 2023