2014 Articles
A sequential Monte Carlo framework for haplotype inference in CNV/SNP genotype data
Copy number variations (CNVs) are abundant in the human genome. They have been associated with complex traits in genome-wide association studies (GWAS) and expected to continue playing an important role in identifying the
etiology of disease phenotypes. As a result of current high throughput whole-genome single-nucleotide polymorphism (SNP) arrays, we currently have datasets that simultaneously have integer copy numbers in CNV regions as well as SNP genotypes. At the same time, haplotypes that have been shown to offer advantages over genotypes in identifying disease traits even though available for SNP genotypes are largely not available for CNV/SNP data due to insufficient computational tools. We introduce a new framework for inferring haplotypes in CNV/SNP data using a sequential Monte Carlo sampling scheme ‘Tree-Based Deterministic Sampling CNV’ (TDSCNV). We compare our method with polyHap(v2.0), the only currently available software able to perform inference in CNV/SNP genotypes, on datasets of varying number of markers. We have found that both algorithms show similar accuracy but TDSCNV is an order of magnitude faster while scaling linearly with the number of markers and number of individuals and thus could
be the method of choice for haplotype inference in such datasets. Our method is implemented in the TDSCNV package which is available for download at www.ee.columbia.edu/~anastas/tdscnv.
Files
-
1687-4153-2014-7.xml application/xml 145 KB Download File
-
1687-4153-2014-7.pdf application/pdf 855 KB Download File
Also Published In
- Title
- EURASIP Journal on Bioinformatics and Systems Biology
- DOI
- https://doi.org/10.1186/1687-4153-2014-7
More About This Work
- Academic Units
- Electrical Engineering
- Published Here
- September 23, 2014