Sage Journals: Discover world-class research

Abstract

Methods to impute missing data are routinely used to increase power in genome-wide association studies. There are two broad classes of imputation methods. The first class imputes genotypes at the untyped variants, given those at the typed variants, and then performs a statistical test of association at the imputed variants. The second class, summary statistic imputation (SSI), directly imputes association statistics at the untyped variants, given the association statistics observed at the typed variants. The second class is appealing as it tends to be computationally efficient while only requiring the summary statistics from a study, while the former class requires access to individual-level data that can be difficult to obtain. The statistical properties of these two classes of imputation methods have not been fully understood. In this study, we show that the two classes of imputation methods yield association statistics with similar distributions for sufficiently large sample sizes. Using this relationship, we can understand the effect of the imputation method on power. We show that a commonly used approach to SSI that we term SSI with variance reweighting generally leads to a loss in power. On the contrary, our proposed method for SSI that does not perform variance reweighting fully accounts for imputation uncertainty, while achieving better power.

Get full access to this article

View all access options for this article.

References

Browning

, and Browning

2007. Rapid and accurate haplotype phasing and missing data inference for whole genome association studies using localized haplotype clustering. Am. J. Hum. Genet. 81, 1084–1097.

Hakonarson

, Grant

S.F.

, Bradfield

J.P.

, et al. 2007. A genome-wide association study identifies kiaa0350 as a type 1 diabetes gene. Nature, 448, 591–594.

Han

, Kang

H.M.

, and Eskin

2009. Rapid and accurate multiple testing correction and power estimation for millions of correlated markers. PLoS Genet. 5, e1000456.

Hormozdiari

, Kichaev

, Yang

W.-Y.

, et al. 2015. Identification of causal genes for complex traits. Bioinformatics, 31, i206–i213.

Hormozdiari

, Kostem

, Kang

E.Y.

, et al. 2014. Identifying causal variants at loci with multiple signals of association. Genetics, 198, 497–508.

Hormozdiari

, van de Bunt

, Segre

A.V.

, et al. 2016. Colocalization of GWAS and eQTL signals detects target genes. Am. J. Hum. Genet. 99, 1245–1260.

Howie

, Fuchsberger

, Stephens

, et al. 2012. Fast and accurate genotype imputation in genome-wide association studies through pre-phasing. Nat. Genet. 44, 955–959.

Howie

B.N.

, Donnelly

, and Marchini

2009. A flexible and accurate genotype imputation method for the next generation of genome-wide association studies. PLoS Genet. 5, e1000529.

Kostem

, Lozano

J.A.

, and Eskin

2011. Increasing power of genome-wide association studies by collecting additional single-nucleotide polymorphisms. Genetics, 188, 449–460.

10.

Köttgen

, Albrecht

, Teumer

, et al. 2012. Genome-wide association analyses identify 18 new loci associated with serum urate concentrations. Nat. Genet. 45, 145–154.

11.

Lee

, Bigdeli

T.B.

, Riley

B.P.

, et al. 2013. Dist: Direct imputation of summary statistics for unmeasured SNPs. Bioinformatics, 29, 2925–2927.

12.

, and Stephens

2003. Modeling linkage disequilibrium and identifying recombination hotspots using single-nucleotide polymorphism data. Genetics, 165, 2213–2233.

13.

, Willer

, Sanna

, et al. 2009. Genotype imputation. Annu Rev Genomics Hum Genet. 10, 387–406.

14.

, Willer

C.J.

, Ding

, et al. 2010. Mach: Using sequence and genotype data to estimate haplotypes and unobserved genotypes. Genet. Epidemiol. 34, 816–834.

15.

, Vitart

, Burdon

K.P.

, et al. 2013. Genome-wide association analyses identify multiple loci associated with central corneal thickness and keratoconus. Nat. Genet. 45, 155–163.

16.

Marchini

, and Howie

2010. Genotype imputation for genome-wide association studies. Nat. Rev. Genet. 11, 499–511.

17.

Marchini

, Howie

, Myers

, et al. 2007. A new multipoint method for genome-wide association studies by imputation of genotypes. Nat. Genet. 39, 906–913.

18.

Pasaniuc

, Zaitlen

, Shi

, et al. 2014. Fast and accurate imputation of summary statistics enhances evidence of functional enrichment. Bioinformatics, 30, 2906–2914.

19.

Pritchard

J.K.

, and Przeworski

2001. Linkage disequilibrium in humans: Models and data. Am. J. Hum. Genet. 69, 1–14.

20.

Reich

D.E.

, Cargill

, Bolk

, et al. 2001. Linkage disequilibrium in the human genome. Nature, 411, 199–204.

21.

Ripke

, O'Dushlaine

, Chambert

, et al. 2013. Genome-wide association analysis identifies 13 new risk loci for schizophrenia. Nat. Genet. 45, 1150–1159.

22.

Sabatti

, Hartikainen

A.-L.

, Pouta

, et al. 2009. Genome-wide association analysis of metabolic traits in a birth cohort from a founder population. Nat. Genet. 41, 35–46.

23.

Scheet

, and Stephens

2006. A fast and flexible statistical model for large-scale population genotype data: Applications to inferring missing genotypes and haplotypic phase. Am. J. Hum. Genet. 78, 629–644.

24.

Sladek

, Rocheleau

, Rung

, et al. 2007. A genome-wide association study identifies novel risk loci for type 2 diabetes. Nature, 445, 881–885.

25.

Wen

, and Stephens

2010. Using linear predictors to impute allele frequencies from summary or pooled genotype data. Ann Appl. Stat. 4, 1158.

26.

Yang

, Manolio

T.A.

, Pasquale

L.R.

, et al. 2011. Genome partitioning of genetic variation for complex traits using common SNPs. Nat. Genet. 43, 519–525.

27.

Zeggini

, Weedon

M.N.

, Lindgren

C.M.

, et al. 2007. Replication of genome-wide association signals in UK samples reveals risk loci for type 2 diabetes. Science, 316, 1336–1341.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.11 MB

A Unifying Framework for Imputing Summary Statistics in Genome-Wide Association Studies

Abstract

Get full access to this article

References

Supplementary Material