Home

Whole Population, Genomewide Mapping of Hidden Relatedness

Alexander Gusev; Jennifer K. Lowe; Markus Stoffel; Mark Daly; David Altshuler; Jeffrey M. Friedman; Jan L. Breslow; Itshack G. Pe'er

Title:
Whole Population, Genomewide Mapping of Hidden Relatedness
Author(s):
Gusev, Alexander
Lowe, Jennifer K.
Stoffel, Markus
Daly, Mark
Altshuler, David
Friedman, Jeffrey M.
Breslow, Jan L.
Pe'er, Itshack G.
Date:
Type:
Technical reports
Department:
Computer Science
Permanent URL:
Series:
Columbia University Computer Science Technical Reports
Part Number:
CUCS-027-08
Publisher:
Department of Computer Science, Columbia University
Publisher Location:
New York
Abstract:
The ability to identify and quantify genealogical relationships between individuals within a population is an important step in accurately using such data for disease analysis and improving our understanding of demography. However, exhaustive pair-wise analysis which has been successful in small cohorts cannot keep up with the current torrent of genotype data. We present GERMLINE, a robust algorithm for identifying pairwise segmental sharing which scales linearly with the number of input individuals. Our approach is based on a dictionary of haplotypes, used to efficiently discover short exact matches between individuals and then expand these matches to identify long nearly-identical segmental sharing that is indicative of relatedness. We use GERMLINE to comprehensively survey hidden relatedness both in the HapMap as well as in a densely typed island population of 3,000 individuals. We verify that GERMLINE is in concordance with other methods when they can process the data, and also facilitates analysis of larger scale studies. We also demonstrate novel applications of precise analysis of hidden relatedness to detection of haplotype phasing errors and structural variation. We show that shared segment discovery can help identifying phasing errors and potentially resolve them. Finally, we use detected identity of genomic segments for exposing polymorphic deletions that are otherwise challenging to detect, with 8/14 deletions in the HapMap samples and 153/200 deletions in the island data having independent experimental validation.
Subject(s):
Computer science
Genetics
Item views:
174
Metadata:
text | xml

In Partnership with the Center for Digital Research and Scholarship at Columbia University Libraries/Information Services | Terms of Use