2011 Articles
A Hidden Markov Model for Copy Number Variant prediction from whole genome resequencing data
Motivation: Copy Number Variants (CNVs) are important genetic factors for studying human diseases. While high-throughput whole genome re-sequencing provides multiple lines of evidence for detecting CNVs, computational algorithms need to be tailored for different type or size of CNVs under different experimental designs. Results: To achieve optimal power and resolution of detecting CNVs at low depth of coverage, we implemented a Hidden Markov Model that integrates both depth of coverage and mate-pair relationship. The novelty of our algorithm is that we infer the likelihood of carrying a deletion jointly from multiple mate pairs in a region without the requirement of a single mate pairs being obvious outliers. By integrating all useful information in a comprehensive model, our method is able to detect medium-size deletions (200-2000bp) at low depth (<10× per sample). We applied the method to simulated data and demonstrate the power of detecting medium-size deletions is close to theoretical values. Availability: A program implemented in Java, Zinfandel, is available at http://www.cs.columbia.edu/~itsik/zinfandel/
Files
-
1471-2105-12-S6-S4.pdf application/pdf 729 KB Download File
-
1471-2105-12-S6-S4.xml application/xml 37.7 KB Download File
Also Published In
- Title
- BMC Bioinformatics
- DOI
- https://doi.org/10.1186/1471-2105-12-S6-S4
More About This Work
- Academic Units
- Biomedical Informatics
- Publisher
- BioMed Central
- Published Here
- September 8, 2014