Structural transfer using EDAs: An application to multi-marker tagging SNP selection

In this paper we investigate the question of transfer learning in evolutionary optimization using estimation of distribution algorithms. We propose a framework for transfer learning between related optimization problems by means of structural transfer. Different methods for incrementing or replacing the (possibly unavailable) structural information of the target optimization problem are presented. As a test case we solve the multi-marker tagging single-nucleotide polymorphism (SNP) selection problem, a real world problem from genetics. The introduced variants of structural transfer are validated in the computation of tagging SNPs on a database of 1167 individuals from 58 human populations worldwide. Our experimental results show significant improvements over EDAs that do not incorporate information from related problems.

[1]  Kenneth K Kidd,et al.  Significant variation in haplotype block structure but conservation in tagSNP patterns among global populations , 2007, European Journal of Human Genetics.

[2]  Steffen Bickel,et al.  Discriminative learning for differing training and test distributions , 2007, ICML '07.

[3]  Jonathan Scott Friedlaender,et al.  A Human Genome Diversity Cell Line Panel , 2002, Science.

[4]  P. Deloukas,et al.  The portability of tagSNPs across populations: a worldwide survey. , 2006, Genome research.

[5]  Eleazar Eskin,et al.  Multi-marker tagging single nucleotide polymorphism selection using estimation of distribution algorithms , 2010, Artif. Intell. Medicine.

[6]  Yoav Shoham,et al.  Empirical hardness models: Methodology and a case study on combinatorial auctions , 2009, JACM.

[7]  Heinz Mühlenbein,et al.  Schemata, Distributions and Graphical Models in Evolutionary Optimization , 1999, J. Heuristics.

[8]  Roberto Santana,et al.  The Factorized Distribution Algorithm and The Junction Tree: A Learning Perspective , 2005 .

[9]  Pedro Larrañaga,et al.  Towards a New Evolutionary Computation - Advances in the Estimation of Distribution Algorithms , 2006, Towards a New Evolutionary Computation.

[10]  David E. Goldberg,et al.  Using Previous Models to Bias Structural Learning in the Hierarchical BOA , 2012, Evolutionary Computation.

[11]  Noah A Rosenberg,et al.  Low Levels of Genetic Divergence across Geographically and Linguistically Diverse Populations from India , 2006, PLoS genetics.

[12]  Rich Caruana,et al.  Multitask Learning , 1997, Machine-mediated learning.

[13]  Pedro Larrañaga,et al.  Estimation of Distribution Algorithms , 2002, Genetic Algorithms and Evolutionary Computation.

[14]  H. Mühlenbein,et al.  From Recombination of Genes to the Estimation of Distributions I. Binary Parameters , 1996, PPSN.

[15]  Concha Bielza,et al.  Network measures for information extraction in evolutionary algorithms , 2013, Int. J. Comput. Intell. Syst..

[16]  S. Baluja,et al.  Using Optimal Dependency-Trees for Combinatorial Optimization: Learning the Structure of the Search Space , 1997 .

[17]  Michael W. Mahoney,et al.  Intra- and interpopulation genotype reconstruction from tagging SNPs. , 2006, Genome research.

[18]  T. Ben-David,et al.  Exploiting Task Relatedness for Multiple , 2003 .

[19]  Shumeet Baluja,et al.  Incorporating a priori Knowledge in Probabilistic-Model Based Optimization , 2006, Scalable Optimization via Probabilistic Modeling.

[20]  Sebastian Thrun,et al.  Learning to Learn , 1998, Springer US.

[21]  J. A. Lozano,et al.  Estimation of Distribution Algorithms: A New Tool for Evolutionary Computation , 2001 .

[22]  Peter Stone,et al.  Transfer Learning for Reinforcement Learning Domains: A Survey , 2009, J. Mach. Learn. Res..

[23]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[24]  Zhaohui S. Qin,et al.  A second generation human haplotype map of over 3.1 million SNPs , 2007, Nature.

[25]  Deborah A. Nickerson,et al.  Efficient selection of tagging single-nucleotide polymorphisms in multiple populations , 2006, Human Genetics.

[26]  D. Conrad,et al.  Using Population Mixtures to Optimize the Utility of Genomic Databases: Linkage Disequilibrium and Association Study Design in India , 2008, Annals of human genetics.

[27]  Qiang Yang,et al.  Self-taught clustering , 2008, ICML '08.

[28]  H. Shimodaira,et al.  Improving predictive inference under covariate shift by weighting the log-likelihood function , 2000 .

[29]  J. A. Lozano,et al.  Towards a New Evolutionary Computation: Advances on Estimation of Distribution Algorithms (Studies in Fuzziness and Soft Computing) , 2006 .

[30]  Sunita Sarawagi,et al.  Domain Adaptation of Conditional Probability Models Via Feature Subsetting , 2007, PKDD.

[31]  Yong Yu,et al.  Bridged Refinement for Transfer Learning , 2007, PKDD.

[32]  Pedro Larrañaga,et al.  The Role of a Priori Information in the Minimization of Contact Potentials by Means of Estimation of Distribution Algorithms , 2007, EvoBIO.