Academic Commons

Articles

A Hidden Markov Model for Copy Number Variant prediction from whole genome resequencing data

Shen, Yufeng; Gu, Yiwei; Pe’er, Itsik

Motivation: Copy Number Variants (CNVs) are important genetic factors for studying human diseases. While high-throughput whole genome re-sequencing provides multiple lines of evidence for detecting CNVs, computational algorithms need to be tailored for different type or size of CNVs under different experimental designs. Results: To achieve optimal power and resolution of detecting CNVs at low depth of coverage, we implemented a Hidden Markov Model that integrates both depth of coverage and mate-pair relationship. The novelty of our algorithm is that we infer the likelihood of carrying a deletion jointly from multiple mate pairs in a region without the requirement of a single mate pairs being obvious outliers. By integrating all useful information in a comprehensive model, our method is able to detect medium-size deletions (200-2000bp) at low depth (<10× per sample). We applied the method to simulated data and demonstrate the power of detecting medium-size deletions is close to theoretical values. Availability: A program implemented in Java, Zinfandel, is available at http://www.cs.columbia.edu/~itsik/zinfandel/

Files

  • thumnail for 1471-2105-12-S6-S4.pdf 1471-2105-12-S6-S4.pdf binary/octet-stream 729 KB Download File
  • thumnail for 1471-2105-12-S6-S4.xml 1471-2105-12-S6-S4.xml binary/octet-stream 37.7 KB Download File

Also Published In

Title
BMC Bioinformatics
DOI
https://doi.org/10.1186/1471-2105-12-S6-S4

More About This Work

Academic Units
Biomedical Informatics
Publisher
BioMed Central
Published Here
September 8, 2014