Feature Analysis and Classification of Protein Secondary Structure Data

In this paper, we investigate feature analysis for the prediction of the secondary structure of protein sequences using support vector machines (SVMs) and k-nearest neighbor algorithm (kNN). We apply feature selection and scaling techniques to obtain a number of distinct feature subsets with different features and each scaled differently. The feature selection and the scaling are performed using the mutual information (MI). We formulate the feature selection and scaling as combinatorial optimization problem and obtain solutions using a Hopfield-style algorithm. Our experimental results show that the feature subset selection improves the performance for both SVM and kNN while the feature scaling is consistently beneficial for kNN.

[1]  Jirí Benes,et al.  On neural networks , 1990, Kybernetika.

[2]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[3]  C. A. Murthy,et al.  Unsupervised Feature Selection Using Feature Similarity , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[4]  Sang Uk Lee,et al.  Integrated Position Estimation Using Aerial Image Sequences , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[5]  Chris H. Q. Ding,et al.  Multi-class protein fold recognition using support vector machines and neural networks , 2001, Bioinform..

[6]  B. Rost Review: protein secondary structure prediction continues to rise. , 2001, Journal of structural biology.

[7]  Pat Langley,et al.  Selection of Relevant Features and Examples in Machine Learning , 1997, Artif. Intell..

[8]  Rich Caruana,et al.  Greedy Attribute Selection , 1994, ICML.

[9]  Huan Liu,et al.  Neural-network feature selector , 1997, IEEE Trans. Neural Networks.

[10]  J. J. Hopfield,et al.  “Neural” computation of decisions in optimization problems , 1985, Biological Cybernetics.

[11]  Wlodzislaw Duch,et al.  Search and global minimization in similarity-based methods , 1999, IJCNN'99. International Joint Conference on Neural Networks. Proceedings (Cat. No.99CH36339).

[12]  Pat Langley,et al.  Editorial: On Machine Learning , 1986, Machine Learning.

[13]  David W. Aha,et al.  A Review and Empirical Evaluation of Feature Weighting Methods for a Class of Lazy Learning Algorithms , 1997, Artificial Intelligence Review.

[14]  Mineichi Kudo,et al.  Comparison of algorithms that select features for pattern classifiers , 2000, Pattern Recognit..

[15]  Zheng Tang,et al.  A learning method in Hopfield neural network for combinatorial optimization problem , 2002, Neurocomputing.

[16]  G. Pawley,et al.  On the stability of the Travelling Salesman Problem algorithm of Hopfield and Tank , 2004, Biological Cybernetics.

[17]  Nikhil R. Pal,et al.  Soft computing for feature analysis , 1999, Fuzzy Sets Syst..

[18]  Huan Liu,et al.  Feature Selection for Classification , 1997, Intell. Data Anal..

[19]  Chong-Ho Choi,et al.  Input feature selection for classification problems , 2002, IEEE Trans. Neural Networks.

[20]  Anil K. Jain,et al.  Feature Selection: Evaluation, Application, and Small Sample Performance , 1997, IEEE Trans. Pattern Anal. Mach. Intell..