This paper studies haplotype inference by maximum parsimony using population data. We
define the optimal haplotype inference (OHI) problem as given a set of genotypes and a
set of related haplotypes, find a minimum subset of haplotypes that can resolve all the
genotypes. We prove that OHI is NP-hard and can be formulated as an integer quadratic
programming (IQP) problem. To solve the IQP problem, we propose an iterative semidefinite
programming-based approximation algorithm, (called SDPHapInfer). We show that
this algorithm finds a solution within a factor of O(log n) of the optimal solution, where n
is the number of genotypes. This algorithm has been implemented and tested on a variety
of simulated and biological data. In comparison with three other methods, (1) HAPAR,
which was implemented based on the branching and bound algorithm, (2) HAPLOTYPER,
which was implemented based on the expectation-maximization algorithm, and (3) PHASE,
which combined the Gibbs sampling algorithm with an approximate coalescent prior, the
experimental results indicate that SDPHapInfer and HAPLOTYPER have similar error
rates. In addition, the results generated by PHASE have lower error rates on some data
but higher error rates on others. The error rates of HAPAR are higher than the others on
biological data. In terms of efficiency, SDPHapInfer, HAPLOTYPER, and PHASE output
a solution in a stable and consistent way, and they run much faster than HAPAR when the
number of genotypes becomes large.