Bandit-based local feature subset selection

Abstract In this work we propose a method for local feature subset selection, where we simultaneously partition the sample space into localities and select features for them. The partitions and the corresponding local features are represented using a novel notion of feature tree. The problem of finding an appropriate feature tree is then formulated as a reinforcement learning problem. A value-based Monte Carlo tree search with the corresponding credit assignment policy is devised to learn near-optimal feature trees. Furthermore, the Monte Carlo tree search is enhanced in a way to be applicable for large numbers of actions (i.e., features). This objective is achieved by taking into account a bandit-based explorative policy while having a soft exploitive estimation policy. The results for synthetic datasets show that when local features are present in data, the proposed method can outperform other feature selection methods. Furthermore, the results for microarray classification show that the method can obtain results comparable to the state of the art, using a simple KNN classifier.

[1]  Masashi Sugiyama,et al.  Dimensionality Reduction of Multimodal Labeled Data by Local Fisher Discriminant Analysis , 2007, J. Mach. Learn. Res..

[2]  R. Agrawal Sample mean based index policies by O(log n) regret for the multi-armed bandit problem , 1995, Advances in Applied Probability.

[3]  Nuno Vasconcelos,et al.  Direct convex relaxations of sparse SVM , 2007, ICML '07.

[4]  James Theiler,et al.  Grafting: Fast, Incremental Feature Selection by Gradient Descent in Function Space , 2003, J. Mach. Learn. Res..

[5]  Jian Li,et al.  Iterative RELIEF for feature weighting , 2006, ICML.

[6]  Sylvain Gelly,et al.  Exploration exploitation in Go: UCT for Monte-Carlo Go , 2006, NIPS 2006.

[7]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[8]  U. Alon,et al.  Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[9]  Majid Nili Ahmadabadi,et al.  Directed Random Subspace Method for Face Recognition , 2010, 2010 20th International Conference on Pattern Recognition.

[10]  Csaba Szepesvári,et al.  Bandit Based Monte-Carlo Planning , 2006, ECML.

[11]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[12]  Majid Nili Ahmadabadi,et al.  Optimal Local Basis: A Reinforcement Learning Approach for Face Recognition , 2009, International Journal of Computer Vision.

[13]  Jason Weston,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2002, Machine Learning.

[14]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[15]  Michèle Sebag,et al.  Feature Selection as a One-Player Game , 2010, ICML.

[16]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[17]  Marcel J. T. Reinders,et al.  Random subspace method for multivariate feature selection , 2006, Pattern Recognit. Lett..

[18]  Huan Liu,et al.  Feature Selection and Classification - A Probabilistic Wrapper Approach , 1996, IEA/AIE.

[19]  Keinosuke Fukunaga,et al.  A Branch and Bound Algorithm for Feature Subset Selection , 1977, IEEE Transactions on Computers.

[20]  Ash A. Alizadeh,et al.  Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling , 2000, Nature.

[21]  Haleh Vafaie,et al.  Feature Selection Methods: Genetic Algorithms vs. Greedy-like Search , 2009 .

[22]  Arlindo L. Oliveira,et al.  Biclustering algorithms for biological data analysis: a survey , 2004, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[23]  Ron Kohavi,et al.  Irrelevant Features and the Subset Selection Problem , 1994, ICML.

[24]  C MadeiraSara,et al.  Biclustering Algorithms for Biological Data Analysis , 2004 .

[25]  A. Wayne Whitney,et al.  A Direct Method of Nonparametric Measurement Selection , 1971, IEEE Transactions on Computers.

[26]  Yijun Sun,et al.  Iterative RELIEF for Feature Weighting: Algorithms, Theories, and Applications , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[28]  William H. Press,et al.  Numerical recipes in C , 2002 .

[29]  Pedro M. Domingos Control-Sensitive Feature Selection for Lazy Learners , 1997, Artificial Intelligence Review.

[30]  André Elisseeff,et al.  Stability and Generalization , 2002, J. Mach. Learn. Res..

[31]  Yvan Saeys,et al.  Robust Feature Selection Using Ensemble Feature Selection Techniques , 2008, ECML/PKDD.

[32]  T HarandiMehrtash,et al.  Optimal Local Basis , 2009 .

[33]  Roded Sharan,et al.  Biclustering Algorithms: A Survey , 2007 .

[34]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[35]  Craig Boutilier,et al.  Stochastic dynamic programming with factored representations , 2000, Artif. Intell..

[36]  Glenn Fung,et al.  Data selection for support vector machine classifiers , 2000, KDD '00.

[37]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .