Sage Journals: Discover world-class research

Abstract

A phase transition is taking place today. The amount of data generated by genome resequencing technologies is so large that in some cases it is now less expensive to repeat the experiment than to store the information generated by the experiment. In the next few years, it is quite possible that millions of Americans will have been genotyped. The question then arises of how to make the best use of this information and jointly estimate the haplotypes of all these individuals. The premise of this article is that long shared genomic regions (or tracts) are unlikely unless the haplotypes are identical by descent. These tracts can be used as input for a Clark-like phasing method to obtain a phasing solution of the sample. We show on simulated data that the algorithm will get an almost perfect solution if the number of individuals being genotyped is large enough and the correctness of the algorithm grows with the number of individuals being genotyped. We also study a related problem that connects copy number variation with phasing algorithm success. A loss of heterozygosity (LOH) event is when, by the laws of Mendelian inheritance, an individual should be heterozygote but, due to a deletion polymorphism, is not. Such polymorphisms are difficult to detect using existing algorithms, but play an important role in the genetics of disease and will confuse haplotype phasing algorithms if not accounted for. We will present an algorithm for detecting LOH regions across the genomes of thousands of individuals. The design of the long-range phasing algorithm and the loss of heterozygosity inference algorithms was inspired by our analysis of the Multiple Sclerosis (MS) GWAS dataset of the International Multiple Sclerosis Genetics Consortium. We present similar results to those obtained from the MS data.

Get full access to this article

View all access options for this article.

References

Altshuler

, Daly

M.J.

, Lander

E.S.

2008. Genetic mapping in human disease. Science, 322:881–888.

Browning

B.L.

, Browning

S.R.

2009. A unified approach to genotype imputation and haplotype-phase inference for large data sets of trios and unrelated individuals. Am. J. Hum. Genet., 84:210–223.

Clark

1990. Inference of haplotypes from PCR-amplified samples of diploid populations. Mol. Biol. Evol., 7:111–122.

Conrad

D.F.

, Andrews

T.D.

, Carter

N.P.

et al. 2006. A high-resolution survey of deletion polymorphism in the human genome. Nat. Genet., 38:75–81.

Corona

, Raphael

, Eskin

2007. Identification of deletion polymorphisms from haplotypes. Proc. RECOMB 2007, 354–365.

Gudbjartsson

D.F.

, Walters

B.G.

, Thorleifsson

et al. 2008. Many sequence variants affecting diversity of adult human height. Nat. Genet., 40:609–615.

Halldórsson

, Bafna

, Edwards

et al. 2004. A survey of computational methods for determining haplotypes Lect. Notes Bioinformatics, 2983:613–614.

Howie

B.N.

, Donnelly

, Marchini

2009. A flexible and accurate genotype imputation method for the next generation of genome-wide association studies. PLoS Genet., 5e1000529+.

Hudson

R.R.

2002. Generating samples under a Wright-Fisher neutral model of genetic variation. Bioinformatics, 18:337–338.

10.

Kong

, Masson

, Frigge

M.L.

et al. 2008. Detection of sharing by descent, long-range phasing and haplotype imputation. Nat. Genet., 40:1068–1075.

11.

McCarroll

S.A.

, Hadnott

T.N.

, Perry

G.H.

et al. 2005. Common deletion polymorphisms in the human genome. Nat. Genet., 38:86–92.

12.

McCarroll

S.A.

, Kuruvilla

F.G.

, Korn

J.M.

et al. 2008. Integrated detection and population-genetic analysis of SNPs and copy number variation. Nat. Genet., 40:1166–1174.

13.

Minichiello

M.J.

, Durbin

2006. Mapping trait loci by use of inferred ancestral recombination graphs. Am. J. Hum. Genet., 79:910–922.

14.

Rivadeneira

, Styrkársdottir

, Estrada

et al. 2009. Twenty bone-mineral-density loci identified by large-scale meta-analysis of genome-wide association studies. Nat. Genet., 41:1199–1206.

15.

Scheet

, Stephens

2006. A fast and flexible statistical model for large-scale population genotype data: applications to inferring missing genotypes and haplotypic phase. Am. J. Hum. Genet., 78:629–644.

16.

Sharan

, Halldórsson

B.V.

, Istrail

2006. Islands of tractability for parsimony haplotyping. IEEE/ACM Trans. Comput. Biol. Bioinformatics, 3:303–311.

17.

Stefansson

, Rujescu

, Cichon

et al. 2008. Large recurrent microdeletions associated with schizophrenia. Nature, 455:232–236.

18.

Stephens

, Smith

, Donnelly

2001. A new statistical method for haplotype reconstruction from population data. Am. J. Hum. Genet., 68:978–989.

19.

Styrkarsdottir

, Halldorsson

B.V.

, Gretarsdottir

et al. 2008. Multiple genetic loci for bone mineral density and fractures. N. Engl. J. Med., 358:2355–2365.

20.

The 1000 Genomes Project Consortium. 2010. A map of human genome variation from population-scale sequencing. Nature, 467:1061–1073.

21.

The International Multiple Sclerosis Genetics Consortium. 2007. Risk alleles for multiple sclerosis identified by a genomewide study. N. Engl. J. Med., 357:851–862.

The Clark Phaseable Sample Size Problem: Long-Range Phasing and Loss of Heterozygosity in GWAS

Abstract

Abstract

Get full access to this article

References