Sage Journals: Discover world-class research

Abstract

Statistical approaches for population structure estimation have been predominantly driven by a particular data type, single-nucleotide polymorphisms (SNPs). However, in the presence of weak identifiability in SNPs, population structure estimation can suffer from undesirable accuracy loss. Copy number variations (CNVs) are genomic structural variants with loci that are commonly shared within a specific population and thus provide valuable information for estimation of the ancestry of sampled populations. We develop a Bayesian joint modeling framework of SNPs and CNVs, called POPSTR, to better understand population structure than approaches that use SNPs solely. To deal with the increased data volume, we use the Metropolis Adjusted Langevin algorithm (MALA) that guides the target distribution in a computationally efficient way. We illustrate applications of our approach using the HapMap 2005 project data. We carry out simulation studies and show that the performance of our approach is comparable or better than that of popular benchmarks, STRUCTURE and ADMIXTURE. We also observe that using only CNVs can be remarkably efficient if SNP data are not available.

Get full access to this article

View all access options for this article.

References

Alexander

D.H.

, Novembre

, and Lange

2009. Fast model-based estimation of ancestry in unrelated individuals. Genome Res. 19, 1655–1664.

Atchade

2006. An adaptive version for the metropolis adjusted Langevin algorithm with a truncated drift. Methodol. Comput. Appl. Probab. 8, 235–254.

Balding

2006. A tutorial on statistical methods for population association studies. Nat. Rev., 7, 781–791.

Campbell

C.D.

, Ogubrn

E.L.

, Lunetta

K.L.

, et al. 2005. Demonstrating stratification in a European American population. Nat. Genet., 37, 868–872.

Catarina

D.C.

, Sampas

, Tsalenko

, et al. 2011. Population-genetic properties of differentiated human copy-number polymorphisms. Am. J. Hum. Genet., 88, 317–332.

Colobran

, Comas

, Faner

, et al. 2008. Population structure in copy number variation and SNPs in the CCL4L chemokine gene. Genes Immun. 9, 279–288.

Conrad

, Pinto

, Redon

, et al. 2009. Origins and functional impact of copy number variation in the human genome. Nature. 464, 704–712.

Falush

, Stephens

, and Pritchard

J.K.

2003. Inference of population structure using multi-locus genotype data, linked loci, and correlated allele frequencies. Genetics, 164, 1567–1587.

Gattepaille

L.M.

, and Jakobsson

2013. Inferring population size changes with sequence and SNP data: Lessons from human bottlenecks. Heredity, 110, 409–419.

10.

Gelman

, and Rubin

D.B.

1992. Inference from iterative simulation using multiple sequences. Stat. Sci. 27, 457–511.

11.

Hellenthal

, Busby

G.B.J.

, Wilson

J.F.

, et al. 2014. A genetic atlas of human admixture history. Science, 14, 747–751.

12.

Jakobsson

, Scholz

, Scheet

, et al. 2008. Genotype, haplotype and copy-number variation in worldwide human populations. Nature, 451, 998–1003.

13.

McCarroll

S.A.

, Kuruvilla

F.G.

, Korn

J.M.

, et al. 2008. Integrated detection and population-genetic analysis of SNPs and copy number variation. Nat. Genet., 40, 1166–1174.

14.

Novembre

, and Stephens

2008. Interpreting principal component analyses of spatial population genetic variation. Nat. Genet., 40, 646–649.

15.

Patterson

, Price

, and Reich

2006. Population structure and eigenanalysis. PLoS Genet. 2, e190.

16.

Porras-Hurtado

, Ruiz

, Santos

, et al. 2013. An overview of STRUCTURE: Applications, parameter settings, and supporting software. Front. Genet. 4, 98.

17.

Price

, Butler

, Patterson

, et al. 2008. Discerning the ancestry of European Americans in genetic association studies. PLoS Genet. 4, e236.

18.

Price

, Patterson

, Plenge

, et al. 2006. Principal components analysis corrects for stratification in genome-wide association studies. Nat. Genet., 38, 904–909.

19.

Pritchard

J.K.

, Stephens

, and Donnelly

2000. Inference of population structure using multilocus genotype data. Genetics, 155, 945–959.

20.

Pronold

, Vali

, Pique-Regi , et al. 2012. Copy number variation signature to predict human ancestry. BMC Bioinformatics. 13, 336.

21.

Raj

, Stephens

, and Pritchard

J.K.

2014. fastSTRUCTURE: Variational inference of population structure in large SNP data sets. Genetics. 197, 573–589.

22.

Redon

, Ishikawa

, Fitch

K.R.

, et al. 2006. Global variation in copy number in the human genome. Nature. 444, 444–454.

23.

Roberts

G.O.

, and Rosenthal

J.S.

2001. Optimal scaling for various metropolis-hastings algorithms. Stat. Sci. 16, 351–367.

24.

Roberts

G.O.

, and Tweedie

1996. Exponential convergence of Langevin distributions and their discrete approximations. Bernoulli. 2, 341–363.

25.

Rosenberg

, Pritchard

, Weber

, et al. 2002. Genetic structure of human populations. Science. 298, 2381–2385.

26.

Rousseeuw

1987. Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. Comput. Appl. Math. 20, 53–65.

27.

Schwarz

1978. Estimating the dimension of a model. Ann Stat. 6, 461–464.

28.

Shringarpure

, and Xing

E.P.

2009. mStruct: Inference of population structure in light of both genetic admixing and allele mutations. Genetics, 182, 575–593.

29.

Sohn

, Ghahramani

, and Xing

E.P.

2012. Robust estimation of local genetic ancestry in admixed populations using a nonparametric Bayesian approach. Genetics, 191, 1295–1308.

30.

Tang

, Peng

, Wang

, et al. 2005. Estimation of individual admixture: Analytical and study design considerations. Genet. Epidemiol., 28, 289–301.

31.

The International HapMap Consortium. 2005. A haplotype map of the human genome. Nature, 437, 1299–1320.

32.

Tian

, Gregersen

P.K.

, and Seldin

M.F.

2008. Accounting for ancestry: Population substructure and genome-wide association studies. Hum. Mol. Genet., 17, 143–150.

33.

Wang

, Ray

, Rojas

, et al. 2008. Geographic patterns of genome admixture in Latin American Mestizos. PLoS Genet. 4, e1000037.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.72 MB

POPSTR: Inference of Admixed Population Structure Based on Single-Nucleotide Polymorphisms and Copy Number Variations

Abstract

Abstract

Get full access to this article

References

Supplementary Material