Whole Population, Genomewide Mapping of Hidden Relatedness
Gusev
Alexander
author
Columbia University. Computer Science
Lowe
Jennifer K.
author
Stoffel
Markus
author
Daly
Mark
author
Altshuler
David
author
Friedman
Jeffrey M.
author
Breslow
Jan L.
author
Pe'er
Itshack G.
author
Columbia University. Computer Science
Columbia University. Computer Science
originator
contributor
text
Technical reports
New York
Department of Computer Science, Columbia University
2008
The ability to identify and quantify genealogical relationships between individuals within a population is an important step in accurately using such data for disease analysis and improving our understanding of demography. However, exhaustive pair-wise analysis which has been successful in small cohorts cannot keep up with the current torrent of genotype data. We present GERMLINE, a robust algorithm for identifying pairwise segmental sharing which scales linearly with the number of input individuals. Our approach is based on a dictionary of haplotypes, used to efficiently discover short exact matches between individuals and then expand these matches to identify long nearly-identical segmental sharing that is indicative of relatedness. We use GERMLINE to comprehensively survey hidden relatedness both in the HapMap as well as in a densely typed island population of 3,000 individuals. We verify that GERMLINE is in concordance with other methods when they can process the data, and also facilitates analysis of larger scale studies. We also demonstrate novel applications of precise analysis of hidden relatedness to detection of haplotype phasing errors and structural variation. We show that shared segment discovery can help identifying phasing errors and potentially resolve them. Finally, we use detected identity of genomic segments for exposing polymorphic deletions that are otherwise challenging to detect, with 8/14 deletions in the HapMap samples and 153/200 deletions in the island data having independent experimental validation.
Computer science
Genetics
Columbia University Computer Science Technical Reports
CUCS-027-08
http://hdl.handle.net/10022/AC:P:29583
English
NNC
NNC
2011-04-26 12:09:23 -0400
2012-04-13 11:37:07 -0400
3946
eng