A Bayesian genomic selection approach incorporating prior feature ordering and population structures with application to coronary artery disease

Abstract

Coronary artery disease is one of the most common types of cardiovascular disease. Death from coronary heart disease is influenced by genetic factors in both women and men. In this article, we propose a novel Bayesian variable selection framework for the identification of important genetic variants associated with coronary artery disease disease status. Instead of treating each feature independently as in conventional Bayesian variable selection methods, we propose an innovative prior for the inclusion probabilities of genetic variants that accounts for their ordering structure. We assume that neighboring variants are more likely to be selected together as they tend to be highly correlated and have similar biological functions. Additionally, we propose to group participating subjects based on underlying population structure and fit separate regressions, so that the regression coefficients can better reflect different disease risks in different population groups. Our approach borrows strength across regression models through an innovative prior inspired by the Markov random fields. The proposed framework can improve variable selection and prediction performances as demonstrated in the simulation studies. We also apply the proposed framework to the CATHeterization GENetics data with binary Coronary artery disease disease status.

Keywords

Coronary artery disease Bayesian variable selection Markov random fields prior CATHGEN

Get full access to this article

View all access options for this article.

References

Kathiresan

Srivastava

. Genetics of human cardiovascular disease. Cell 2012; 148: 1242–1257.

Nabel

. Cardiovascular disease. New England Journal of Medicine 2003; 349: 60–72.

Musunuru

Kathiresan

. Genetics of common, complex coronary artery disease. Cell 2019; 177: 132–145.

Zellner

Siow

. Posterior odds ratios for selected regression hypotheses. Trabajos de estadística y de investigación operativa 1980; 31: 585–603.

Mitchell

Beauchamp

. Bayesian variable selection in linear regression. J Am Stat Assoc 1988; 83: 1023–1032.

Park

Casella

. The Bayesian lasso. J Am Stat Assoc 2008; 103: 681–686.

Narisetty

. Bayesian variable selection with shrinking and diffusing priors. Ann Stat 2014; 42: 789–817.

Zhang

. Bayesian variable selection in structured high-dimensional covariate spaces with applications in genomics. J Am Stat Assoc 2010; 105: 1202–1214.

Wei

Pan

. Incorporating gene networks into statistical tests for genomic data via a spatially correlated mixture model. Bioinformatics 2008; 24: 404–411.

10.

Wei

Pan

. Network-based genomic discovery: application and comparison of Markov random-field models. J R Stat Soc: Ser C 2010; 59: 105–125.

11.

Vannucci

Stingo

Berzuini

. Bayesian models for variable selection that incorporate biological information. Bayesian Statistics 2010; 9: 1–20.

12.

Arnesen

Tjelmeland

. Prior specification of neighbourhood and interaction structure in binary Markov random fields. Stat Comput 2017; 27: 737–756.

13.

George

McCulloch

. Approaches for Bayesian variable selection. Stat Sin 1997; 7: 339–373.

14.

Reich

Cargill

Bolk

et al. Linkage disequilibrium in the human genome. Nature 2001; 411: 199–204.

15.

. Gene ontology semantic similarity analysis using GOSemSim. In Stem Cell Transcriptional Networks, pp. 207–215. Springer, 2020.

16.

Tcheandjieu

Zhu

Hilliard

et al. Large-scale genome-wide association study of coronary artery disease in genetically diverse populations. Nat Med 2022; 1: 1–13.

17.

Eastwood

J-A

Doering

. Gender differences in coronary artery disease. J Cardiovasc Nurs 2005; 20: 340–351.

18.

Sheifer

Escarce

Schulman

. Race and sex differences in the management of coronary artery disease. Am Heart J 2000; 139: 848–857.

19.

Zhou

Stephens

. Genome-wide efficient mixed-model analysis for association studies. Nat Genet 2012; 44: 821–824.

20.

Maity

Bhattacharya

Mallick

, et al. Bayesian data integration and variable selection for pan-cancer survival prediction using protein expression data. Biometrics 2020; 76: 316–325.

21.

Madjar

Zucknick

Ickstadt

, et al. Combining heterogeneous subgroups with graph-structured variable selection priors for Cox regression. BMC Bioinformatics 2021; 22: 1–29.

22.

Chekouo

Stingo

Doecke

, et al. A Bayesian integrative approach for multi-platform genomic data: a kidney cancer case study. Biometrics 2017; 73: 615–624.

23.

Hastie

Tibshirani

. Varying-coefficient models. J R Stat Soc: Ser B 1993; 55: 757–779.

24.

Liu

. Feature selection for varying coefficient models with ultrahigh-dimensional covariates. J Am Stat Assoc 2014; 109: 266–274.

25.

Guhaniyogi

Savitsky

, et al. Distributed Bayesian varying coefficient modeling using a Gaussian process prior. J Mach Learn Res 2022; 23: 1–59.

26.

Biller

Fahrmeir

. Bayesian varying-coefficient models using adaptive regression splines. Stat Model 2001; 1: 195–211.

27.

Heuclin

Mortier

Trottier

, et al. Bayesian varying coefficient model with selection: an application to functional mapping. J R Stat Soc: Ser C 2021; 70: 24–50.

28.

Albert

Chib

. Bayesian analysis of binary and polychotomous response data. J Am Stat Assoc 1993; 88: 669–679.

29.

Chib

Greenberg

. Hierarchical analysis of SUR models with extensions to correlated serial errors and time-varying parameter models. J Econom 1995; 68: 339–360.

30.

Holmes

Denison

Mallick

. Accounting for model uncertainty in seemingly unrelated regressions. J Comput Graph Stat 2002; 11: 533–551.

31.

Smith

Kohn

. Nonparametric seemingly unrelated regression. J Econom 2000; 98: 257–281.

32.

Wang

. Sparse seemingly unrelated regression modelling: applications in finance and econometrics. Comput Stat Data Anal 2010; 54: 2866–2877.

33.

Talhouk

Doucet

Murphy

. Efficient Bayesian inference for multivariate probit models with sparse inverse correlation matrices. J Comput Graph Stat 2012; 21: 739–757.

34.

Ruffieux

Davison

Hager

, et al. A global-local approach for detecting hotspots in multiple-response regression. Ann Appl Stat 2020; 14: 905–928.

35.

Pitt

Chan

Kohn

. Efficient Bayesian inference for Gaussian copula regression models. Biometrika 2006; 93: 537–554.

36.

Alexopoulos

Bottolo

. Bayesian variable selection for Gaussian copula regression models. J Comput Graph Stat 2021; 30: 578–593.

37.

Gao

Chen

Sun

, et al. Gender differences in cardiovascular disease. Med Nov Technol Devices 2019; 4: 100025.

38.

Regitz-Zagrosek

. Therapeutic implications of the gender-specific aspects of cardiovascular disease. Nat Rev Drug Discov 2006; 5: 425–439.

39.

Johnson

. Hierarchical clustering schemes. Psychometrika 1967; 32: 241–254.

40.

Johnson

Rossell

. Bayesian model selection in high-dimensional settings. J Am Stat Assoc 2012; 107: 649–660.

41.

Stingo

Chen

Tadesse

, et al. Incorporating biological information into linear models: a Bayesian approach to the selection of pathways and genes. Ann Appl Stat 2011; 5: 1978–2002.

42.

Stingo

Vannucci

Downey

. Bayesian wavelet-based curve classification via discriminant analysis with Markov random tree priors. Stat Sin 2012; 22: 465–488.

43.

Brown

Vannucci

Fearn

. Multivariate Bayesian variable selection and prediction. J R Stat Soc: Ser B 1998; 60: 627–641.

44.

Guan

Stephens

. Bayesian variable selection regression for genome-wide association studies and other large-scale problems. Ann Appl Stat 2011; 5: 1780–1815.

45.

Chekouo

Stingo

Guindani

, et al. A Bayesian predictive model for imaging genetics with application to schizophrenia. Ann Appl Stat 2016; 10: 1547–1571.

46.

Sha

Vannucci

Tadesse

et al. Bayesian variable selection in multinomial probit models to identify molecular signatures of disease stage. Biometrics 2004; 60: 812–819.

47.

Chekouo

Mohammed

Rao

. A Bayesian 2D functional linear model for gray-level co-occurrence matrices in texture analysis of lower grade gliomas. NeuroImage: Clinical 2020; 28: 102437.

48.

Gelfand

Dey

. Bayesian model choice: asymptotics and exact calculations. J R Stat Soc: Ser B 1994; 56: 501–514.

49.

Vehtari

Gelman

Gabry

. Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC. Stat Comput 2017; 27: 1413–1432.

50.

Lamnisos

Griffin

Steel

. Cross-validation prior choice in Bayesian probit regression with many covariates. Stat Comput 2012; 22: 359–373.

51.

Raftery

Madigan

Volinsky

. Accounting for model uncertainty in survival analysis improves predictive performance. Bayesian Stat 1996; 5: 323–349.

52.

Tibshirani

. Regression shrinkage and selection via the lasso. J R Stat Soc: Ser B 1996; 58: 267–288.

53.

Zou

Hastie

. Regularization and variable selection via the elastic net. J R Stat Soc: Ser B 2005; 67: 301–320.

54.

Ishwaran

Rao

. Spike and slab variable selection: frequentist and Bayesian strategies. Ann Stat 2005; 33: 730–773.

55.

Friedman

Hastie

Tibshirani

. Regularization paths for generalized linear models via coordinate descent. J Stat Softw 2010; 33: 1–22.

56.

Shah

Granger

Hauser

et al. Reclassification of cardiovascular risk using integrated clinical and molecular biosignatures: design of and rationale for the measurement to understand the reclassification of disease of Cabarrus and Kannapolis (MURDOCK) Horizon 1 cardiovascular disease study. Am Heart J 2010; 160: 371–379.

57.

Krämer

Green

Pollard Jr

, et al. Causal analysis approaches in Ingenuity pathway analysis. Bioinformatics 2014; 30: 523–530.

58.

Warnes

. genetics: Population Genetics. R package version 1.3.8.1.3, https://CRAN.R-project.org/package=genetics, 2021.

59.

Consortium

WTCC

. Genome-wide association study of 14,000 cases of seven common diseases and 3000 shared controls. Nature 2007; 447: 661–678.

60.

Frere

Cuisset

Quilici

, et al. ADP-induced platelet aggregation and platelet reactivity index VASP are good predictive markers for clinical outcomes in non-ST elevation acute coronary syndrome. Thromb Haemost 2007; 98: 838–843.

61.

Burgner

Davila

Breunis

et al. A genome-wide association study identifies novel and functionally related susceptibility loci for Kawasaki disease. PLoS Genet 2009; 5: e1000319.

62.

Sáez

González-Pérez

Martínez-Larrad

, et al. WWOX gene is associated with HDL cholesterol and triglyceride levels. BMC Med Genet 2010; 11: 1–8.

63.

Dorn II

Shetty

et al. A genome-wide association study of idiopathic dilated cardiomyopathy in African Americans. J Pers Med 2018; 8: 11.

64.

Puig

Wang

I-M

Cheng

, et al. Transcriptome profiling and network analysis of genetically hypertensive mice identifies potential pharmacological targets of hypertension. Physiol Genomics 2010; 42: 24–32.

65.

Tomanek

Ishii

Holifield

, et al. VEGF family members regulate myocardial tubulogenesis and coronary artery formation in the embryo. Circ Res 2006; 98: 947–953.

66.

Gao

Ren

Lee

J-H

et al. RBFox1-mediated RNA splicing regulates cardiac hypertrophy and heart failure. J Clin Invest 2016; 126: 195–206.

67.

Yamada

Matsui

Takeuchi

, et al. Association of genetic variants with coronary artery disease and ischemic stroke in a longitudinal population-based genetic epidemiological study. Biomed Rep 2015; 3: 413–419.

68.

Waldron

. Patterns and causes of gender differences in smoking. Soc Sci Med 1991; 32: 989–1005.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

2.47 MB