Academic Commons

Theses Doctoral

Statistical Methods for Epigenetic Data

Wang, Ya

DNA methylation plays a crucial role in human health, especially cancer. Traditional DNA methylation analysis aims to identify CpGs/genes with differential methylation (DM) between experimental groups. Differential variability (DV) was recently observed that contributes to cancer heterogeneity and was also shown to be essential in detecting early DNA methylation alterations, notably epigenetic field defects. Moreover, studies have demonstrated that environmental factors may modify the effect of DNA methylation on health outcomes, or vice versa. Therefore, this dissertation seeks to develop new statistical methods for epigenetic data focusing on DV and interactions when efficient analytical tools are lacking. First, as neighboring CpG sites are usually highly correlated, we introduced a new method to detect differentially methylated regions (DMRs) that uses combined DM and DV signals between diseased and non-diseased groups. Next, using both DM and DV signals, we considered the problem of identifying epigenetic field defects, when CpG-site-level DM and DV signals are minimal and hard to be detected by existing methods. We proposed a weighted epigenetic distance-based method that accumulates CpG-site-level DM and DV signals in a gene. Here DV signals were captured by a pseudo-data matrix constructed using centered quadratic methylation measures. CpG-site-level association signal annotations were introduced as weights in distance calculations to up-weight signal CpGs and down-weight noise CpGs to further boost the study power. Lastly, we extended the weighted epigenetic distance-based method to incorporate DNA methylation by environment interactions in the detection of overall association between DNA methylation and health outcomes. A pseudo-data matrix was constructed with cross-product terms between DNA methylation and environmental factors that is able to capture their interactions. The superior performance of the proposed methods were shown through intensive simulation studies and real data applications to multiple DNA methylation data.

Files

This item is currently under embargo. It will be available starting 2021-04-24.

More About This Work

Academic Units
Biostatistics
Thesis Advisors
Wang, Shuang
Degree
D.P.H., Mailman School of Public Health, Columbia University
Published Here
April 30, 2019
Academic Commons provides global access to research and scholarship produced at Columbia University, Barnard College, Teachers College, Union Theological Seminary and Jewish Theological Seminary. Academic Commons is managed by the Columbia University Libraries.