Multi-Class Protein Fold Recognition Using Multi-Objective Evolutionary Algorithms

Protein fold recognition (PFR) is an important approach to structure discovery without relying on sequence similarity. In the pattern recognition terminology, PFR is a multi-class classification problem to be solved by employing feature analysis and pattern classification techniques. This paper reformulates PFR into a multi-objective optimization problem (7) and proposes a Multi-Objective Feature Analysis and Selection Algorithm (MOFASA). We use support vector machines as the classifier. Experimental results on the Structural Classification of Protein (SCOP) data set indicate that MOFASA is capable of achieving comparable performances to the results reported in (10). In addition, MOFASA identifies relevant features for further biological analysis.

[1]  Bþ KHI,et al.  Classification of Two-Class Cancer Data Reliably Using , .

[2]  Kalyanmoy Deb,et al.  A fast and elitist multiobjective genetic algorithm: NSGA-II , 2002, IEEE Trans. Evol. Comput..

[3]  David C. Jones,et al.  GenTHREADER: an efficient and reliable protein fold recognition method for genomic sequences. , 1999, Journal of molecular biology.

[4]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[5]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[6]  David C. Jones,et al.  CATH--a hierarchic classification of protein domain structures. , 1997, Structure.

[7]  Yves Deville,et al.  Multi-class protein fold classification using a new ensemble machine learning approach. , 2003, Genome informatics. International Conference on Genome Informatics.

[8]  D. E. Goldberg,et al.  Genetic Algorithms in Search , 1989 .

[9]  Chris H. Q. Ding,et al.  Multi-class protein fold recognition using support vector machines and neural networks , 2001, Bioinform..

[10]  K. Chou,et al.  Application of SVM to predict membrane protein types. , 2004, Journal of theoretical biology.

[11]  S. Gutzwiller,et al.  Robert S , 2002 .

[12]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[13]  Mineichi Kudo,et al.  Comparison of algorithms that select features for pattern classifiers , 2000, Pattern Recognit..

[14]  B. Rost,et al.  Prediction of protein secondary structure at better than 70% accuracy. , 1993, Journal of molecular biology.

[15]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[16]  Ponnuthurai N. Suganthan,et al.  Feature Analysis and Classification of Protein Secondary Structure Data , 2003, ICANN.

[17]  Pat Langley,et al.  Selection of Relevant Features and Examples in Machine Learning , 1997, Artif. Intell..

[18]  Sholom M. Weiss,et al.  Estimating Performance Gains for Voted Decision Trees , 1998, Intell. Data Anal..

[19]  Anil K. Jain,et al.  Feature Selection: Evaluation, Application, and Small Sample Performance , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[20]  Pierre Baldi,et al.  Bioinformatics - the machine learning approach (2. ed.) , 2000 .

[21]  Chris Sander,et al.  Protein folds and families: sequence and structure alignments , 1999, Nucleic Acids Res..

[22]  Robert S. Ledley,et al.  The Protein Information Resource , 2003, Nucleic Acids Res..