A Preprocessing Procedure for Haplotype Inference by Pure Parsimony

Haplotype data are especially important in the study of complex diseases since it contains more information than genotype data. However, obtaining haplotype data is technically difficult and costly. Computational methods have proved to be an effective way of inferring haplotype data from genotype data. One of these methods, the haplotype inference by pure parsimony approach (HIPP), casts the problem as an optimization problem and as such has been proved to be NP-hard. We have designed and developed a new preprocessing procedure for this problem. Our proposed algorithm works with groups of haplotypes rather than individual haplotypes. It iterates searching and deleting haplotypes that are not helpful in order to find the optimal solution. This preprocess can be coupled with any of the current solvers for the HIPP that need to preprocess the genotype data. In order to test it, we have used two state-of-the-art solvers, RTIP and GAHAP, and simulated and real HapMap data. Due to the computational time and memory reduction caused by our preprocess, problem instances that were previously unaffordable can be now efficiently solved.

[1]  Inês Lynce,et al.  Efficient Haplotype Inference with Pseudo-boolean Optimization , 2007, AB.

[2]  Weixiong Zhang,et al.  How frugal is mother nature with haplotypes? , 2009, Bioinform..

[3]  Ferhan Türe,et al.  Efficient Haplotype Inference with Answer Set Programming , 2008, AAAI.

[4]  Inês Lynce,et al.  Haplotype inference with pseudo-Boolean optimization , 2011, Ann. Oper. Res..

[5]  Roded Sharan,et al.  Islands of Tractability for Parsimony Haplotyping , 2005, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[6]  Zhaohui S. Qin,et al.  A second generation human haplotype map of over 3.1 million SNPs , 2007, Nature.

[7]  Konstantinos Kalpakis,et al.  Haplotype phasing using semidefinite programming , 2005, Fifth IEEE Symposium on Bioinformatics and Bioengineering (BIBE'05).

[8]  Xiang-Sun Zhang,et al.  Haplotype Inference by Pure Parsimony via Genetic Algorithm , 1997 .

[9]  Paola Bonizzoni,et al.  The Haplotyping problem: An overview of computational models and solutions , 2003, Journal of Computer Science and Technology.

[10]  Dan Gusfield,et al.  Haplotype Inference by Pure Parsimony , 2003, CPM.

[11]  Martine Labbé,et al.  A Class Representative Model for Pure Parsimony Haplotyping , 2010, INFORMS J. Comput..

[12]  Inês Lynce,et al.  Haplotype Inference with Boolean Satisfiability , 2008, Int. J. Artif. Intell. Tools.

[13]  Alex Zelikovsky,et al.  Linear Reduction for Haplotype Inference , 2004, WABI.

[14]  Richard R. Hudson,et al.  Generating samples under a Wright-Fisher neutral model of genetic variation , 2002, Bioinform..

[15]  Shibu Yooseph,et al.  A Survey of Computational Methods for Determining Haplotypes , 2002, Computational Methods for SNPs and Haplotype Inference.

[16]  Inês Lynce,et al.  Efficient Haplotype Inference with Combined CP and OR Techniques , 2008, CPAIOR.

[17]  Giuseppe Lancia,et al.  Haplotyping Populations by Pure Parsimony: Complexity of Exact and Approximation Algorithms , 2004, INFORMS J. Comput..

[18]  Lusheng Wang,et al.  Haplotype inference by maximum parsimony , 2003, Bioinform..

[19]  Joachim Selbig,et al.  Haplotype inference from unphased SNP data in heterozygous polyploids based on SAT , 2008, BMC Genomics.

[20]  P. Donnelly,et al.  A new statistical method for haplotype reconstruction from population data. , 2001, American journal of human genetics.

[21]  Inês Lynce,et al.  Efficient Haplotype Inference with Boolean Satisfiability , 2006, AAAI.

[22]  Andrea Roli,et al.  Stochastic local search for large-scale instances of the haplotype inference problem by pure parsimony , 2008, J. Algorithms.

[23]  Shibu Yooseph,et al.  A Note on Efficient Computation of Haplotypes via Perfect Phylogeny , 2004, J. Comput. Biol..

[24]  Daniel G. Brown,et al.  Integer programming approaches to haplotype inference by pure parsimony , 2006, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[25]  Dan Gusfield,et al.  An Overview of Combinatorial Methods for Haplotype Inference , 2002, Computational Methods for SNPs and Haplotype Inference.