Beijing Institute of Genomics, Chinese Academy of Sciences, No. 7 Bei Tu Cheng West Road, Beijing 100029, PR China

Department of Biostatistics, Mailman School of Public Health, 722 West 168th Street, Columbia University, New York, NY 10032, USA

Rockefeller University, Laboratory of Statistical Genetics, 1230 York Avenue, New York, NY 10065, USA

Abstract

Background

In human case-control association studies, one of the chi-square tests typically carried out is based on a 2 × 3 table of genotypes (homogeneity of three genotype frequencies in case and control individuals). We formulate the two degrees of freedom associated with a given genotype distribution in terms of two biologically relevant parameters, (1) the probability

Results

Imposing the restriction,

Conclusion

For dominant and recessive modes of inheritance, any apparent power gain by an allele test when carried out in conjunction with a genotype test tends to be purchased entirely by an increased rate of false positive results due to omission of a multiple testing correction. As an alternative to these two standard association tests, our FP test represents a convenient and more powerful alternative.

Background

In their well-known paper on homozygosity mapping published 20 years ago, Lander and Botstein

Even individuals seemingly collected at random from the population tend to exhibit extended regions of allele sharing

In recent years, researchers have shown renewed interest in extended segments of homozygosity and have generally done so by focusing on segments of specific lengths

Results

Statistical model for SNP genotype frequencies

Consider a SNP marker with two alleles, ^{2}, 2^{2 }(Hardy-Weinberg equilibrium, HWE). Then the genotype probabilities may be formulated as given in Table

Genotype parametrization

_{1 }= ^{2}

_{2 }= 2

_{3 }= ^{2}

1

Parametrization of genotype frequencies (

As there are equal numbers (i.e., 2) of free (independent) genotype classes and parameters, finding maximum likelihood estimates (MLEs) amounts to simply equating genotype frequencies with their probabilities of occurrence (Table _{1 }and _{2 }in Table

The MLEs of _{1}, _{2}, and _{3 }are simply the proportions of individuals with given genotypes. Because of the invariance property of MLEs, the functions _{1 }and _{2 }(equation 1) are also MLEs. The inbreeding coefficient _{2 }is smaller than expected under HWE then

The inbreeding coefficient

So far, the expressions for the genotype frequencies in Table

Statistical test

We want to test the null hypothesis _{0 }of no association versus the alternative hypothesis _{1 }of association. Under _{0}, allele frequencies and _{1}, allele frequencies may be different between cases and controls and so may be _{1}. As outlined in detail in the Methods section, we formulate this test as a likelihood ratio (LR) test, the _{1}, _{2}, and _{3 }are the respective numbers of case individuals with genotypes _{i }are functions of _{a }and _{a }are the parameter values in case individuals. For control individuals, the log likelihood log [_{b}(_{b}, _{b})] is obtained in an analogous manner. The test statistic is T = 2{log [_{a}(_{a}, _{a})] + log [_{b}(_{b}, _{b})] - log [_{c}(_{c}, _{c})]}, where the subscript _{0}. However, because of the conditions imposed (

Power calculations

To evaluate the performance of our new test with existing tests, we carry out power calculations under a recessive and a dominant model of disease inheritance, where we assume a functional SNP fully associated with the disease variant. Model parameters (penetrances and disease allele frequencies) are calibrated to predict a trait prevalence of 5% for each model. The proportion of affected individuals in the population whose disease is due to the given gene is fixed at 10%. The "strength" of a model is measured by the penetrance ratio,

We compare three tests, our _{0}.

Figure

Power for recessive disease models

**Power for recessive disease models**. Power (y-axis) as a function of the penetrance ratio,

Power for dominant disease models

**Power for dominant disease models**. Power (y-axis) as a function of the penetrance ratio,

Under both dominant and recessive models, the

We also considered Risch's genotype relative risk model ^{2}

Application to published data

We applied our

Test results for observed data

Data

SNP

_{genotype}

_{FP}

_{allele}

_{case}

_{control}

AMD

rs380390

0.0380

0.0090

0.0056

0.215

-0.073

rs10272438

1.0000

0.9068

0.0194

0.733

0.611

AMD HK

rs10490924

0.0002

0.0002

0.0002

0.243

-0.062

rs10504152

0.0058

0.1286

0.2222

0.132

-0.271

rs584244

0.1824

0.0996

0.1010

-0.011

-0.149

PD

rs9952724

0.0004

0.0002

1.0000

0.788

0.022

rs850084

0.0022

0.0002

0.9932

0.828

0.243

rs10963676

0.0058

0.0004

0.0072

0.817

0.086

rs4746675

0.0062

0.0004

1.0000

0.839

0.048

rs557074

0.0068

0.0012

1.0000

0.736

0.029

rs1504212

0.0088

0.0014

1.0000

0.494

-0.023

rs12364577

0.0174

0.0020

1.0000

0.519

0.014

rs1468375

0.0240

0.0042

0.0002

0.452

-0.040

Results of the genotype,

For each of the three studies in Table 3, all SNPs are listed that achieved an experiment-wise significance level of 0.05 or less in either one of the three association tests. Of the 13 resulting SNPs, 11 show a smaller

Discussion

As mentioned in the introduction, researchers often look for genomic regions of increased homozygosity or autozygosity by sliding a window of fixed length across the genome. Our test offers an elegant alternative to such windows of fixed and arbitrary lengths. We propose to work with scan statistics as previously developed

It is interesting to note unrestricted values of

The null distribution of the

Conclusion

At least for the recessive and dominant models considered here, our

Methods

The restriction of _{1 }and _{3}. Simple algebraic manipulation (not shown here) of equations (1) demonstrates that 0 ≤ _{2}. Figure _{1 }+ _{3 }≤ 1, the unrestricted parameter space corresponds to the lower triangle in Figure

Parameter space for SNP genotype frequencies

**Parameter space for SNP genotype frequencies**. Parameters _{1 }and _{3 }are frequencies for genotypes

The resulting restricted parameter space corresponds to the area marked "F > 0" in Figure

For power calculations, we assume disease models with two alleles and three genotypes, _{1}, _{2}, and _{3}, where we set _{2 }= _{1 }for recessive models and _{2 }= _{3 }for dominant models. The "strength" of a model is measured by the penetrance ratio, _{3}/_{1}, which is very approximately equal to the odds ratio. Thus, we have three genetic parameters, _{1}, and

_{1}^{2 }+ _{1}(1 - ^{2}) for recessive traits and as

^{2}_{1 }+ (1-^{2})_{1},

Also, the proportion of genetic cases among all affected individuals is

^{2}/(^{2 }+ 1 - ^{2}) for recessive traits and

^{2})^{2})^{2}] for dominant traits.

Fixing

For each of dominant and recessive models, with a value of the penetrance ratio, ^{th }percentile of the computer-generated null distribution of the test statistic as the critical limit). Then power is determined for penetrance ratios ranging from 1 through 8. All power calculations were carried out based on 5,000 replicates.

Authors' contributions

QZ participated in study design and programming, SW formulated the parametrization of SNP genotypes, and JO participated in study design and wrote the manuscript.

Acknowledgements

This work was supported by China NSFC grants, project numbers 30730057 (JO) and 30700442 (QRZ), and by grant MH44292 (JO) from the U.S. National Institute of Mental Health. This study used data from the SNP Database at the NINDS Human Genetics Resource Center DNA and Cell Line Repository