Whole Population, Genomewide Mapping of Hidden Relatedness Gusev Alexander author Columbia University. Computer Science Lowe Jennifer K. author Stoffel Markus author Daly Mark author Altshuler David author Friedman Jeffrey M. author Breslow Jan L. author Pe'er Itshack G. author Columbia University. Computer Science Columbia University. Computer Science originator contributor text Technical reports New York Department of Computer Science, Columbia University 2008 The ability to identify and quantify genealogical relationships between individuals within a population is an important step in accurately using such data for disease analysis and improving our understanding of demography. However, exhaustive pair-wise analysis which has been successful in small cohorts cannot keep up with the current torrent of genotype data. We present GERMLINE, a robust algorithm for identifying pairwise segmental sharing which scales linearly with the number of input individuals. Our approach is based on a dictionary of haplotypes, used to efficiently discover short exact matches between individuals and then expand these matches to identify long nearly-identical segmental sharing that is indicative of relatedness. We use GERMLINE to comprehensively survey hidden relatedness both in the HapMap as well as in a densely typed island population of 3,000 individuals. We verify that GERMLINE is in concordance with other methods when they can process the data, and also facilitates analysis of larger scale studies. We also demonstrate novel applications of precise analysis of hidden relatedness to detection of haplotype phasing errors and structural variation. We show that shared segment discovery can help identifying phasing errors and potentially resolve them. Finally, we use detected identity of genomic segments for exposing polymorphic deletions that are otherwise challenging to detect, with 8/14 deletions in the HapMap samples and 153/200 deletions in the island data having independent experimental validation. Computer science Genetics Columbia University Computer Science Technical Reports CUCS-027-08 http://hdl.handle.net/10022/AC:P:29583 English NNC NNC 2011-04-26 12:09:23 -0400 2012-04-13 11:37:07 -0400 3946 eng