2008 Reports
Whole Population, Genomewide Mapping of Hidden Relatedness
The ability to identify and quantify genealogical relationships between individuals within a population is an important step in accurately using such data for disease analysis and improving our understanding of demography. However, exhaustive pair-wise analysis which has been successful in small cohorts cannot keep up with the current torrent of genotype data. We present GERMLINE, a robust algorithm for identifying pairwise segmental sharing which scales linearly with the number of input individuals. Our approach is based on a dictionary of haplotypes, used to efficiently discover short exact matches between individuals and then expand these matches to identify long nearly-identical segmental sharing that is indicative of relatedness. We use GERMLINE to comprehensively survey hidden relatedness both in the HapMap as well as in a densely typed island population of 3,000 individuals. We verify that GERMLINE is in concordance with other methods when they can process the data, and also facilitates analysis of larger scale studies. We also demonstrate novel applications of precise analysis of hidden relatedness to detection of haplotype phasing errors and structural variation. We show that shared segment discovery can help identifying phasing errors and potentially resolve them. Finally, we use detected identity of genomic segments for exposing polymorphic deletions that are otherwise challenging to detect, with 8/14 deletions in the HapMap samples and 153/200 deletions in the island data having independent experimental validation.
Subjects
Files
-
cucs-027-08.pdf application/pdf 919 KB Download File
More About This Work
- Academic Units
- Computer Science
- Publisher
- Department of Computer Science, Columbia University
- Series
- Columbia University Computer Science Technical Reports, CUCS-027-08
- Published Here
- April 26, 2011