Academic Commons

Theses Doctoral

Disentangling mutation and selection in human genetic variation: promises and pitfalls

Agarwal, Ipsita

A subset of germline mutations that arise de novo each generation are deleterious and may cause severe genetic diseases. Predicting where in the genome and how often we expect to see deleterious mutations requires an understanding both of the distribution of mutation rates and the distribution of fitness effects in the genome. Both aspects are addressed in turn in the two projects described in this thesis.

The distribution of mutations in the genome is poorly understood because germline mutations occur very rarely. In Chapter 1 of this work, we investigated the sources of mutations by using the spectrum of low-frequency variants in 13,860 human X chromosomes and autosomes as a proxy for the spectrum of germline de novo mutations. By comparing the mutation spectrum in multiple genomic compartments on the autosomes and between the X and autosomes that have unique biochemical and sex-specific properties, we ascribed specific mutation patterns to replication timing and recombination and identified differences in the types of mutations that accrue in males and females. Understanding mutational mechanisms provides a basis for modeling mutation rate variation in the genome, which is ultimately needed to infer the fitness effects of mutations.

In Chapter 2, we used patterns of human genetic variation at methylated CpGsites, known to experience mutations at very high rates, to directly learn about the fitness effects of mutations at these sites. In whole exome sequences now available for 390,000 humans, 99% of putatively-neutral, synonymous CpG sites have experienced a C>T mutation; at current sample sizes, not seeing a C>T mutation at these sites indicates strong selection against that mutation. We leveraged the saturation of neutral C>T mutations and the similarity of mutation rates at methylated CpG sites across annotations to identify the subset of sites in a given functional annotation of interest that are likely to be under strong selection. One implication of this work is that for the vast majority of sites in the genome, there will be little information about strong selection even in samples that are many times larger than at present; the distribution of fitness effects at highly mutable CpG sites may then serve as an anchor for what to expect for other types of sites.

Through the two specific cases described, this work illustrates the potential of large contemporary repositories of human genetic variation to inform human genetics and evolution, as well as their limitations in the absence of suitable models of mutation, selection, and other aspects of the evolutionary process.


  • thumnail for Agarwal_columbia_0054D_16452.pdf Agarwal_columbia_0054D_16452.pdf application/pdf 4.18 MB Download File

More About This Work

Academic Units
Biological Sciences
Thesis Advisors
Przeworski, Molly F.
Ph.D., Columbia University
Published Here
April 21, 2021