Mathematical programming strategies for solving the minimum common string partition problem

The minimum common string partition problem is an NP-hard combinatorial optimization problem with applications in computational biology. In this work we propose the first integer linear programming model for solving this problem. Moreover, on the basis of the integer linear programming model we develop a deterministic 2-phase heuristic which is applicable to larger problem instances. The results show that provenly optimal solutions can be obtained for problem instances of small and medium size from the literature by solving the proposed integer linear programming model with CPLEX. Furthermore, new best-known solutions are obtained for all considered problem instances from the literature. Concerning the heuristic, we were able to show that it outperforms heuristic competitors from the related literature.

[1]  Bin Fu,et al.  Exponential and Polynomial Time Algorithms for the Minimum Common String Partition Problem , 2011, COCOA.

[2]  Petr Kolman,et al.  Reversal Distance for Strings with Duplicates: Linear Time Approximation using Hitting Set , 2006, Electron. J. Comb..

[3]  P. Pardalos,et al.  Optimization techniques for string selection and comparison problems in genomics , 2005, IEEE Engineering in Medicine and Biology Magazine.

[4]  D. Eppstein Foreword to special issue on SODA 2002 , 2007, TALG.

[5]  Haim Kaplan,et al.  The greedy algorithm for edit distance with moves , 2006, Inf. Process. Lett..

[6]  Petr Kolman,et al.  Minimum Common String Partition Problem: Hardness and Approximations , 2004, Electron. J. Comb..

[7]  Petr Kolman,et al.  Approximating reversal distance for strings with bounded number of duplicates , 2005, Discret. Appl. Math..

[8]  Tomasz Waleń,et al.  Reversal Distance for Strings with Duplicates: Linear Time Approximation using Hitting Set , 2007 .

[9]  Peter Damaschke,et al.  Minimum Common String Partition Parameterized , 2008, WABI.

[10]  Dan Gusfield,et al.  Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology , 1997 .

[11]  Sayyed Rasoul Mousavi,et al.  An improved heuristic for the far from most strings problem , 2012, J. Heuristics.

[12]  Dana Shapira,et al.  Edit distance with move operations , 2002, J. Discrete Algorithms.

[13]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[14]  M. W. Du,et al.  Computing a longest common subsequence for a set of strings , 1984, BIT.

[15]  Dan Gusfield,et al.  Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology , 1997 .

[16]  Marek Chrobak,et al.  The greedy algorithm for the minimum common string partition problem , 2005, TALG.

[17]  Graham Cormode,et al.  The string edit distance matching problem with moves , 2002, SODA '02.

[18]  Mohammad Sohel Rahman,et al.  Solving the Minimum Common String Partition Problem with the Help of Ants , 2013, ICSI.

[19]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[20]  Panos M. Pardalos,et al.  Efficient Algorithms for Local Alignment Search , 2001, J. Comb. Optim..

[21]  Liang Ding,et al.  Algebrization and Randomization Methods , 2013 .

[22]  Panos M. Pardalos,et al.  Optimization Approaches for Solving String Selection Problems , 2013 .

[23]  Hong Zhu,et al.  Minimum common string partition revisited , 2010, J. Comb. Optim..

[24]  Christian Blum,et al.  Iterative Probabilistic Tree Search for the Minimum Common String Partition Problem , 2014, Hybrid Metaheuristics.

[25]  Panos M. Pardalos,et al.  Efficient Algorithms for Similarity Search , 2001, J. Comb. Optim..

[26]  Dan He,et al.  A Novel Greedy Algorithm for the Minimum Common String Partition Problem , 2007, ISBRA.

[27]  Tao Jiang,et al.  Computing the Assignment of Orthologous Genes via Genome Rearrangement , 2005, APBC.

[28]  Moshe Lewenstein,et al.  Quick Greedy Computation for Minimum Common String Partitions , 2011, CPM.