Articles

Analysis of genome-wide association data by large-scale Bayesian logistic regression

Wang, Yuanjia; Sha, Nanshi; Fang, Yixin

Single-locus analysis is often used to analyze genome-wide association (GWA) data, but such analysis is subject to severe multiple comparisons adjustment. Multivariate logistic regression is proposed to fit a multi-locus model for case-control data. However, when the sample size is much smaller than the number of single-nucleotide polymorphisms (SNPs) or when correlation among SNPs is high, traditional multivariate logistic regression breaks down. To accommodate the scale of data from a GWA while controlling for collinearity and overfitting in a high dimensional predictor space, we propose a variable selection procedure using Bayesian logistic regression. We explored a connection between Bayesian regression with certain priors and L1 and L2 penalized logistic regression. After analyzing large number of SNPs simultaneously in a Bayesian regression, we selected important SNPs for further consideration. With much fewer SNPs of interest, problems of multiple comparisons and collinearity are less severe. We conducted simulation studies to examine probability of correctly selecting disease contributing SNPs and applied developed methods to analyze Genetic Analysis Workshop 16 North American Rheumatoid Arthritis Consortium data.

Subjects

Files

  • thumnail for fe3bce59e3cd7fb0a8aaa8930f56df7b.zip fe3bce59e3cd7fb0a8aaa8930f56df7b.zip application/zip 705 KB Download File

Also Published In

Title
BMC Proceedings

More About This Work

Academic Units
Biostatistics
Publisher
BioMed Central
Published Here
September 8, 2014