2017 Theses Doctoral
Developing Statistical Methods for Incorporating Complexity in Association Studies
Genome-wide association studies (GWAS) have identified thousands of genetic variants associated with hundreds of human traits. Yet the common variant model tested by traditional GWAS only provides an incomplete explanation for the known genetic heritability of many traits. Many divergent methods have been proposed to address the shortcomings of GWAS, including most notably the extension of association methods into rarer variants through whole exome and whole genome sequencing. GWAS methods feature numerous simplifications designed for feasibility and ease of use, as opposed to statistical rigor. Furthermore, no systematic quantification of the performance of GWAS across all traits exists. Beyond improving the utility of data that already exist, a more thorough understanding of the performance of GWAS on common variants may elucidate flaws not in the method but rather in its implementation, which may pose a continued or growing threat to the utility of rare variant association studies now underway.
This thesis focuses on systematic evaluation and incremental improvement of GWAS modeling. We collect a rich dataset containing standardized association results from all GWAS conducted on quantitative human traits, finding that while the majority of published significant results in the field do not disclose sufficient information to determine whether the results are actually valid, those that do replicate precisely in concordance with their statistical power when conducted in samples of similar ancestry and reporting accurate per-locus sample sizes. We then look to the inability of effectively all existing association methods to handle missingness in genetic data, and show that adapting missingness theory from statistics can both increase power and provide a flexible framework for extending most existing tools with minimal effort. We finally undertake novel variant association in a schizophrenia cohort from a bottleneck population. We find that the study itself is confounded by nonrandom population sampling and identity-by-descent, manifesting as batch effects correlated with outcome that remain in novel variants after all sample-wide quality control. On the whole, these results emphasize both the past and present utility and reliability of the GWAS model, as well as the extent to which lessons from the GWAS era must inform genetic studies moving forward.
- Palmer_columbia_0054D_14234.pdf application/pdf 12 MB Download File
More About This Work
- Academic Units
- Cellular, Molecular and Biomedical Studies
- Thesis Advisors
- Pe'er, Itshack G.
- Ph.D., Columbia University
- Published Here
- October 7, 2017