On the Homogenization of Data from Two Laboratories Using Genetic Programming

In experimental sciences, diversity tends to difficult predictive models’ proper generalization across data provided by different laboratories. Thus, training on a data set produced by one lab and testing on data provided by another lab usually results in low classification accuracy. Despite the fact that the same protocols were followed, variability on measurements can introduce unforeseen variations that affect the quality of the model. This paper proposes a Genetic Programming based approach, where a transformation of the data from the second lab is evolved driven by classifier performance. A real-world problem, prostate cancer diagnosis, is presented as an example where the proposed approach was capable of repairing the fracture between the data of two different laboratories.

[1]  F. Wilcoxon Individual Comparisons by Ranking Methods , 1945 .

[2]  A. K. Jain,et al.  A critical evaluation of intrinsic dimensionality algorithms. , 1980 .

[3]  Jack Perkins,et al.  Pattern recognition in practice , 1980 .

[4]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[5]  Walter Alden Tackett,et al.  Genetic Programming for Feature Discovery and Image Discrimination , 1993, ICGA.

[6]  John R. Koza,et al.  Genetic programming - on the programming of computers by means of natural selection , 1993, Complex adaptive systems.

[7]  Stephanie Forrest,et al.  Proceedings of the 5th International Conference on Genetic Algorithms , 1993 .

[8]  David J. Sheskin,et al.  Handbook of Parametric and Nonparametric Statistical Procedures , 1997 .

[9]  Christopher Harris,et al.  An investigation into the application of genetic programming techniques to signal analysis and feature detection , 1998 .

[10]  Hiroshi Motoda,et al.  Feature Extraction, Construction and Selection: A Data Mining Perspective , 1998 .

[11]  Hiroshi Motoda,et al.  Feature Extraction, Construction and Selection , 1998 .

[12]  Manabu Kotani,et al.  Emergence of feature extraction function using genetic programming , 1999, 1999 Third International Conference on Knowledge-Based Intelligent Information Engineering Systems. Proceedings (Cat. No.99TH8410).

[13]  Martijn C. J. Bot Feature Extraction for the k-Nearest Neighbour Classifier with Genetic Programming , 2001, EuroGP.

[14]  Seong-Whan Lee,et al.  Facial component extraction and face recognition with support vector machines , 2002, Proceedings of Fifth IEEE International Conference on Automatic Face Gesture Recognition.

[15]  Jeffrey Xu Yu,et al.  Mining Changes of Classification by Correspondence Tracing , 2003, SDM.

[16]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[17]  Se-Young Oh,et al.  Facial feature extraction using PCA and wavelet multi-resolution images , 2004, Sixth IEEE International Conference on Automatic Face and Gesture Recognition, 2004. Proceedings..

[18]  Larry Bull,et al.  Genetic Programming with a Genetic Algorithm for Feature Construction and Selection , 2005, Genetic Programming and Evolvable Machines.

[19]  I. W. Levin,et al.  Fourier transform infrared vibrational spectroscopic imaging: integrating microscopy and molecular recognition. , 2005, Annual review of physical chemistry.

[20]  Asoke K. Nandi,et al.  Breast Cancer Diagnosis Using Genetic Programming Generated Feature , 2005 .

[21]  S. Hewitt,et al.  Infrared spectroscopic imaging for histopathologic recognition , 2005, Nature Biotechnology.

[22]  Masoud Nikravesh,et al.  Feature Extraction - Foundations and Applications , 2006, Feature Extraction.

[23]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[24]  Xavier Llorà,et al.  Observer-invariant histopathology using genetics-based machine learning , 2009, Natural Computing.

[25]  Xavier Llorà,et al.  Towards better than human capability in diagnosing prostate cancer using infrared spectroscopic imaging , 2007, GECCO '07.

[26]  P. Rockett,et al.  A Generic Optimal Feature Extraction Method using Multiobjective Genetic Programming : Methodology and Applications , 2007 .

[27]  S. García,et al.  An Extension on "Statistical Comparisons of Classifiers over Multiple Data Sets" for all Pairwise Comparisons , 2008 .

[28]  Xindong Wu,et al.  Conceptual equivalence for contrast mining in classification learning , 2008, Data Knowl. Eng..

[29]  David A. Cieslak,et al.  A framework for monitoring classifiers’ performance: when and why failure occurs? , 2009, Knowledge and Information Systems.

[30]  Francisco Herrera,et al.  A study of statistical techniques and performance measures for genetics-based machine learning: accuracy and interpretability , 2009, Soft Comput..

[31]  Yang Zhang,et al.  A Generic Multi-dimensional Feature Extraction Method Using Multiobjective Genetic Programming , 2009, Evolutionary Computation.

[32]  Francisco Herrera,et al.  Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: Experimental analysis of power , 2010, Inf. Sci..

[33]  Yang Zhang,et al.  A generic optimising feature extraction method using multiobjective genetic programming , 2011, Appl. Soft Comput..