Sage Journals: Discover world-class research

Abstract

Haplotype assembly is to directly construct the haplotypes of an individual from sequence fragments (reads) of the individual. Although a number of programs have been designed for computing optimal or heuristic solutions to the haplotype assembly problem, computing an optimal solution may take days or even months while computing a heuristic solution usually requires a trade-off between speed and accuracy. This article refines a previously known integer linear programming-based (ILP-based) approach to the haplotype assembly problem in twofolds. First, the read-matrices of some datasets (such as NA12878) come with a quality for each base in the reads. We here propose to utilize the qualities in the ILP-based approach. Secondly, we propose to use the ILP-based approach to improve the output of any heuristic program for the problem. Experiments with both real and simulated datasets show that the qualities of read-matrices help us find more accurate solutions without significant loss of speed. Moreover, our experimental results show that the proposed hybrid approach improves the output of ReFHap (the current leading heuristic) significantly (say, by almost 25% of the QAN50 score) without significant loss of speed, and can even find optimal solutions in much shorter time than the original ILP-based approach. Our program is available upon request to the authors.

Get full access to this article

View all access options for this article.

References

Aguiar

, and Istrail

2012. HapCompass: A fast cycle basis algorithm for accurate haplotype assembly of sequence data. J. Comput. Biol., 19, 577–590.

Aguiar

, Wong

W.S.

, and Istrail

2014. Tumor haplotype assembly algorithms for cancer genomics. Pac. Symp. Biocomput., 2014, 3–14.

Bansal

, and Bafna

2008. HapCUT: An efficient and accurate algorithm for the haplotype assembly problem. Bioinformatics, 24, i153.

Beckmann

2010. Haplotype sharing methods. In Fullerlove

, ed. Encyclopedia of Life Sciences (ELS). John Wiley & Sons, Chichester.

Chen

Z.-Z.

, Deng

, and Wang

2013. Exact algorithms for haplotype assembly from whole-genome sequence data. Bioinformatics, 29, 1938–1945.

Chen

, Fu

, Schweller

, et al. 2008. Linear time probabilistic algorithms for the singular haplotype reconstruction problem from SNP fragments. J. Comput. Biol., 15, 535–546.

Cilibrasi

, Van Iersel

, Kelk

, et al. 2005. On the complexity of several haplotyping problems, 128–139. In Casadio

, and Myers

, eds. Algorithms in Bioinformatics. Springer-Verlag, New York.

Clark

A.G.

, Weiss

K.M.

, Nickerson

D.A.

, et al. 1998. Haplotype structure and population genetic inferences from nucleotide-sequence variation in human lipoprotein lipase. Am. J. Hum. Genet., 63, 595–612.

Duitama

, Huebsch

, McEwen

, et al. 2010. ReFHap: A reliable and fast algo- rithm for single individual haplotyping, 160–169. In Ozsoyoglu

, and Mikler

, eds. Bioinformatics and Computational Biology. ACM, New York.

10.

Duitama

, McEwen

G.K.

, Huebsch

, et al. 2012. Fosmid-based whole genome haplo-typing of a HapMap trio child: Evaluation of Single Individual Haplotyping techniques. Nucleic Acids Res. 40, 2041–2053.

11.

, Choi

, Pipatsrisawat

, et al. 2010. Optimal algorithms for haplotype assembly from whole-genome sequence data. Bioinformatics, 26, i183–i190.

12.

Hoehe

M.R.

, Kopke

, Wendel

, et al. 2000. Sequence variability and candidate gene analysis in complex disease: Association of μ opioid receptor gene variation with substance dependence. Hum. Mol. Genet., 9, 2895–2908.

13.

Kuleshov

2014. Probabilistic single-individual haplotyping. Bioinformatics, 30, i379–i385.

14.

Lancia

, Bafna

, Istrail

, et al. 2001. SNPs problems, complexity, and algorithms, 182–193. In Friedhelm Meyer auf der Heide, eds. Algorithms. Springer, Berlin.

15.

Levy

, Sutton

, Ng

P.C.

, et al. 2007. The diploid genome sequence of an individual human. PLoS Biol. 5, e254.

16.

L.M.

, Kim

J.H.

, and Waterman

M.S.

2004. Haplotype reconstruction from SNP alignment. J. Comput. Biol., 11, 505–516.

17.

Lippert

, Schwartz

, Lancia

, et al. 2002. Algorithmic strategies for the single nucleotide polymorphism haplotype assembly problem. Brief. Bioinform. 3, 23–31.

18.

Matsumoto

, and Kiryu

2013. MixSIH: A mixture model for single individual haplo-typing. BMC Genomics, 14 (Suppl. 2), S5.

19.

Musone

S.L.

et al. 2008. Multiple polymorphisms in the TNFAIP3 region are independently associated with systemic lupus erythematosus. Nat. Genet. 40, 1062–1064.

20.

Panconesi

, and Sozio

2004. Fast hare: A fast heuristic for single individual SNP haplotype reconstruction, 266–277. In Jonassen

, and Kim

, eds. Algorithms in Bioinformatics. Springer-Verlag, New York.

21.

Petersdorf

E.W.

, Malkki

, Gooley

T.A.

, et al. 2007. MHC haplotype matching for unrelated hematopoietic cell transplantation. PLoS Med. 4, e8.

22.

Schwartz

2010. Theory and algorithms for the haplotype assembly problem. Commun. Inf. Syst. 10, 23–38.

23.

Schwartz

, Clark

A.G.

, and Istrail

2002. Methods for inferring block-wise ancestral history from haploid sequences, 44–59. In Guigo

, and Gusfield

, eds. Algorithms in Bioinformatics. Springer-Verlag, New York.

24.

Selvaraj

, Dixon

J.R.

, Bansal

, and Ren

2013. Whole-genome haplotype reconstruc- tion using proximity-ligation and shotgun sequencing. Nat. Biotechnol. 31, 1111–1118.

25.

Suk

E.-K.

, McEwen

G.K.

, Duitama

, et al. 2011. A comprehensively molecular haplotype- resolved genome of a European individual. Genome Res. 21, 1672–1685.

26.

Wang

, Xie

, and Chen

2010. A practical exact algorithm for the individual haplo- typing problem MEC/GI. Algorithmica, 56, 283–296.

27.

Wang

R.-S.

, Wu

L.-Y.

, Li

Z.-P.

, et al. 2005. Haplotype reconstruction from SNP fragments by minimum error correction. Bioinformatics, 21, 2456–2462.

28.

Xie

, Wang

, and Jiang

2012. A fast and accurate algorithm for single individual haplotyping. BMC Syst. Biol. 6(Suppl 2), S8.

29.

Zhao

Y.Y.

, Wu

L.Y.

, Zhang

J.H.

, et al. 2005. Haplotype assembly from aligned weighted SNP fragments. Comput. Biol. Chem. 29, 281–287.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.20 MB

Better ILP-Based Approaches to Haplotype Assembly

Abstract

Abstract

Get full access to this article

References

Supplementary Material