Theses Doctoral

Statistical Methods for Genetic Studies with Family History of Diseases

Lee, Annie Jehe

The theme of this dissertation is to develop statistical methods for genetic studies with family history of diseases. Family history of disease is a major risk factor for many health outcomes. To study diseases that aggregate in the families of patients, genetic epidemiological studies recruit independent study participants, often referred to as probands. Probands also provide information on their relatives through a family health history interview. However, due to the high cost of in-person collection of blood samples or death of a relative, dense genotypes are often collected only in probands but not in their family members. In these designs, estimating genetic risk of a disease or identifying genetic risk factors for a complex disease is challenging due to unavailable genotypes in relatives as well as correlation presented among family members' phenotypes. This dissertation contains three parts to tackle these barriers in family studies: (1) develop methods to estimate the genetic risk of a disease more precisely; (2) develop methods to test for association between genetic markers and correlated phenotypes; and (3) develop methods to control population substructure and familial relatedness in genome-wide association studies (GWAS).
In the first part of the dissertation, we propose a method to estimate the age-specific disease risk of genetic mutation in family studies that permits the adjustment for multiple covariates and interaction effects in the presence of unobserved genotypes in relatives. Compared to our previous nonparametric approaches that do not control covariates, our semiparametric estimation method allows controlling for individual characteristics such as sex, ethnicity, environmental risk factors, and genotypes at other loci. Moreover, gene-gene interactions and gene-environment interactions can also be handled within the framework of a semiparametric model. The analyses may provide insights on whether demographics or environmental variables play a role in modifying the genetic risk of a disease. We examine the performance of the proposed methods by simulations and apply them to estimate the age-specific cumulative risk of Parkinson's disease (PD) in relatives predicted to carry the LRRK2 G2019S mutation. The utility of the estimated carrier risk is demonstrated through designing a future clinical trial under various assumptions.
The second part of the dissertation is motivated by extending the single genetic variant set up in the first part to genome-wide genotype data, but focuses on the genetic association tests. Here, we propose a computationally efficient multilevel model to analyze the association of a genetic marker with correlated binary phenotypes in family studies. Our method accounts for both random polygenic effects as well as shared non-genetic familial effects while handling unavailable genotypes in relatives. To discover genetic variants of a complex disorder that aggregates in the families of patients, we consider the combined data of probands with genome-wide genotypes and family history of diseases in relatives (GWAS+FH). To allow for large-scale genetic testing in GWAS+FH, we handle the unobserved genotypes as well as estimate the random effects with reduced computational cost through fast and stable EM-type algorithm as well as score test. Through simulations, we demonstrate that our method of incorporating family history of disease improves efficiency as well as power of detecting disease-associated genetic variants over the methods of using probands data alone, which emphasizes the importance of family studies. Lastly, we apply these methods to discover genetic variants associated with the risk of Alzheimer's disease (AD) for GWAS+FH collected in Washington Heights-Inwood Columbia Aging Project (WHICAP) Caribbean Hispanics. We identified several genetic variants which would not have been discovered by GWAS using proband data alone.
In the third part of the dissertation, we build on the previously introduced random effects to propose a method for genetic association tests in order to control confounding due to familial relatedness in GWAS. It is critical to correct for confounding due to familial relatedness in GWAS in order to minimize spurious associations as well as maximize power to detect true association signals. With available pedigree data, our method uses the polygenic effects as well as the shared non-genetic familial effects in order to control confounding due to familial relatedness in GWAS. Through application to the WHICAP Caribbean Hispanic probands, we show that our method of using the polygenic effects as well as the shared familial effects achieves similar or better performance of controlling the familial relatedness compared to using principal components in GWAS. Notably, our method allows for controlling the confounding due to using family history data, but without requiring dense genotypes in the relatives. We conclude this dissertation by discussing future extensions of this work.


  • thumnail for Lee_columbia_0054D_15166.pdf Lee_columbia_0054D_15166.pdf application/pdf 828 KB Download File

More About This Work

Academic Units
Thesis Advisors
Wang, Yuanjia
Ph.D., Columbia University
Published Here
April 30, 2019