Sage Journals: Discover world-class research

Abstract

Large-scale cohorts with combined genetic and phenotypic data, coupled with methodological advances, have produced increasingly accurate genetic predictors of complex human phenotypes called polygenic risk scores (PRSs). In addition to the potential translational impacts of identifying at-risk individuals, PRS are being utilized for a growing list of scientific applications, including causal inference, identifying pleiotropy and genetic correlation, and powerful gene-based and mixed-model association tests. Existing PRS approaches rely on external large-scale genetic cohorts that have also measured the phenotype of interest. They further require matching on ancestry and genotyping platform or imputation quality. In this work, we present a novel reference-free method to produce a PRS that does not rely on an external cohort. We show that naive implementations of reference-free PRS either result in substantial overfitting or prohibitive increases in computational time. We show that our algorithm avoids both of these issues and can produce informative in-sample PRSs over a single cohort without overfitting. We then demonstrate several novel applications of reference-free PRSs, including detection of pleiotropy across 246 metabolic traits and efficient mixed-model association testing.

Get full access to this article

View all access options for this article.

References

Aulchenko

Y.S.

, De Koning

D.-J.

, and Haley

2007. Grammar: A fast and simple method for genome-wide pedigree-based quantitative trait loci association analysis. Genetics, 177, 577.

Balding

D.J.

, and Nichols

R.A.

1995. A method for quantifying differentiation between populations at multi-allelic loci and its implications for investigating identity and paternity. Genetica, 96, 3–12.

Benjamini

, and Yekutieli

2001. The control of the false discovery rate in multiple testing under dependency. Ann. Stat. 29, 1165–1188.

Burgess

, and Thompson

S.G.

2013. Use of allele scores as instrumental variables for mendelian randomization. Int. J. Epidemiol. 42, 1134–1144.

Chen

G.-B.

2014. Estimating heritability of complex traits from genome-wide association studies using ibs-based haseman–elston regression. Front. Genet. 5, 107.

Cortes

, Dendrou

C.A.

, Motyer

, et al. 2017. Bayesian analysis of genetic association across tree-structured routine healthcare data in the UK Biobank. Nat. Genet. 49, 1311.

Dudbridge

2016. Polygenic epidemiology. Genet. Epidemiol. 40, 268–272.

Gamazon

E.R.

, Wheeler

H.E.

, Shah

, et al. 2015. Predixcan: Trait mapping using human transcriptome regulation. Nat. Genet. 47, 1091–1098.

Gusev

, Ko

, Shi

, et al. 2016. Integrative approaches for large-scale transcriptome-wide association studies. Nat. Genet. 48, 245.

10.

Hastie

, Tibshirani

, and Friedman

2009. The Elements of Statistical Learning: Data Mining, Inference and Prediction, page 244. Springer, Series in Statistics, 2^nd Edition. Springer, New York.

11.

Kang

H.M.

, Sul

J.H.

, Service

S.K.

, et al. 2010. Variance component model to account for sample structure in genome-wide association studies. Nat. Genet. 42, 348.

12.

Kang

H.M.

, Zaitlen

N.A.

, Wade

C.M.

, et al. 2008. Efficient control of population structure in model organism association mapping. Genetics, 178, 1709–1723.

13.

Khera

A.V.

, Chaffin

, Aragam

K.G.

, et al. 2018. Genome-wide polygenic scores for common diseases identify individuals with risk equivalent to monogenic mutations. Nat. Genet. 50, 1219–1224.

14.

Kolde

2019. “pheatmap: Pretty Heatmaps.” R package version 1.0.12. https://cran.r-project.org/web/packages/pheatmap/index.html

15.

Krapohl

, Euesden

, Zabaneh

, et al. 2016. Phenome-wide analysis of genome-wide polygenic scores. Mol. Psychiatry, 21, 1188.

16.

Laakso

, Kuusisto

, Stancakova

, et al. 2017. Metabolic syndrome in men (metsim) study: A resource for studies of metabolic and cardiovascular diseases. J. Lipid Res. 58, 481–493.

17.

Lee

S.H.

, Yang

, Goddard

M.E.

, et al. 2012. Estimation of pleiotropy between complex diseases using single-nucleotide polymorphism-derived genomic relationships and restricted maximum likelihood. Bioinformatics, 28, 2540–2542.

18.

Lippert

, Listgarten

, Liu

, et al. 2011. Fast linear mixed models for genome-wide association studies. Nat. Methods, 8, 833.

19.

Listgarten

, Kadie

, Schadt

E.E.

, et al. 2010. Correction for hidden confounders in the genetic analysis of gene expression. Proc. Natl. Acad. Sci. U. S. A. 107, 16465–16470.

20.

Liu

, Mefford

J.A.

, Dahl

, et al. 2018. GBAT: A gene-based association method for robust trans-gene regulation detection. bioRxiv. https://doi.org/10.1101/39570

21.

Loh

P.-R.

, Tucker

, Bulik-Sullivan

B.K.

, et al. 2015. Efficient bayesian mixed-model analysis increases association power in large cohorts. Nat. Genet. 47, 284.

22.

Maas

, Barrdahl

, Joshi

A.D.

, et al. 2016. Breast cancer risk from modifiable and nonmodifiable risk factors among white women in the United States. JAMA Oncol. 2, 1295–1302.

23.

Martin

A.R.

, Gignoux

C.R.

, Walters

R.K.

, et al. 2017. Human demographic history impacts genetic risk prediction across diverse populations. Am. J. Hum. Genet. 100, 635–649.

24.

Natarajan

, Young

, Stitziel

N.O.

, et al. 2017. Polygenic risk score identifies subgroup with higher burden of atherosclerosis and greater relative benefit from statin therapy in the primary prevention setting. Circulation, 135, 2091–2101.

25.

Nolte

I.M.

, van der Most

P.J.

, Alizadeh

B.Z.

, et al. 2017. Missing heritability: Is the gap closing? An analysis of 32 complex traits in the lifelines cohort study. Eur. J. Hum. Genet. 25, 877.

26.

Patterson

H.D.

, and Thompson

1971. Recovery of inter-block information when block sizes are unequal. Biometrika, 58, 545–554.

27.

Rakitsch

, Lippert

, Stegle

, et al. 2012. A lasso multi-marker mixed model for association mapping with population structure correction. Bioinformatics, 29, 206–214.

28.

Robinson

G.K.

1991. That blup is a good thing: The estimation of random effects. Stat. Sci. 6, 15–32.

29.

Scutari

, Mackay

, and Balding

2016. Using genetic distance to infer the accuracy of genomic prediction. PLoS Genet. 12, e1006288.

30.

Seibert

T.M.

, Fan

C.C.

, Wang

, et al. 2018. Polygenic hazard score to guide screening for aggressive prostate cancer: Development and validation in large scale cohorts. Br. Med. J. 360, j5757.

31.

Svishcheva

G.R.

, Axenovich

T.I.

, Belonogova

N.M.

, et al. 2012. Rapid variance components–based method for whole-genome association analysis. Nat. Genet. 44, 1166.

32.

Terry

R.B.

, Wood

P.D.

, Haskell

W.L.

, et al. 1989. Regional adiposity patterns in relation to lipids, lipoprotein cholesterol, and lipoprotein subfraction mass in men. J. Clin. Endocrinol. Metab. 68, 191–199.

33.

Torkamani

, Wineinger

N.E.

, and Topol

E.J.

2018. The personal and clinical utility of polygenic risk scores. Nat. Rev. Genet. 19, 581–590.

34.

Tucker

, Loh

P.-R.

, MacLeod

I.M.

, et al. 2015. Two-variance-component model improves genetic prediction in family datasets. Am. J. Hum. Genet. 97, 677–690.

35.

Vilhjálmsson

B.J.

, Yang

, Finucane

H.K.

, et al. 2015. Modeling linkage disequilibrium increases accuracy of polygenic risk scores. Am. J. Hum. Genet. 97, 576–592.

36.

Warren

, Casas

J.-P.

, Hingorani

, et al. 2014. Genetic prediction of quantitative lipid traits: Comparing shrinkage models to gene scores. Genet. Epidemiol. 38, 72–83.

37.

Wray

N.R.

, Goddard

M.E.

, and Visscher

P.M.

2007. Prediction of individual genetic risk to disease from genome-wide association studies. Genome Res. 17, 1520–1528.

38.

Yang

, Benyamin

, McEvoy

B.P.

, et al. 2010. Common snps explain a large proportion of the heritability for human height. Nat. Genet. 42, 565.

39.

Yang

, Lee

S.H.

, Goddard

M.E.

, et al. 2011. Gcta: A tool for genome-wide complex trait analysis. Am. J. Hum. Genet. 88, 76–82.

40.

Yang

, Zaitlen

N.A.

, Goddard

M.E.

, et al. 2014. Advantages and pitfalls in the application of mixed-model association methods. Nat. Genet. 46, 100.

41.

Zaitlen

, Kraft

, Patterson

, et al. 2013. Using extended genealogy to estimate components of heritability for 23 quantitative and dichotomous traits. PLoS Genet. 9, e1003520.

42.

Zaitlen

, Lindström

, Pasaniuc

, et al. 2012a. Informed conditioning on clinical covariates increases power in case-control association studies. PLoS Genet. 8, e1003032.

43.

Zaitlen

, Paşaniuc

, Patterson

, et al. 2012b. Analysis of case–control association studies with known risk variants. Bioinformatics 28, 1729–1737.

44.

Zhou

, Carbonetto

, and Stephens

2013. Polygenic modeling with bayesian sparse linear mixed models. PLoS Genet. 9, e1003264.

45.

Zhou

, and Stephens

2012. Genome-wide efficient mixed-model analysis for association studies. Nat. Genet. 44, 821.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.26 MB

Efficient Estimation and Applications of Cross-Validated Genetic Predictions to Polygenic Risk Scores and Linear Mixed Models

Abstract

Get full access to this article

References

Supplementary Material