Feature Selection for Unsupervised Learning

In this paper, we identify two issues involved in developing an automated feature subset selection algorithm for unlabeled data: the need for finding the number of clusters in conjunction with feature selection, and the need for normalizing the bias of feature selection criteria with respect to dimension. We explore the feature selection problem and these issues through FSSEM (Feature Subset Selection using Expectation-Maximization (EM) clustering) and through two different performance criteria for evaluating candidate feature subsets: scatter separability and maximum likelihood. We present proofs on the dimensionality biases of these feature criteria, and present a cross-projection normalization scheme that can be applied to any criterion to ameliorate these biases. Our experiments show the need for feature selection, the need for addressing these two issues, and the effectiveness of our proposed solutions.

[1]  Thomas Marill,et al.  On the effectiveness of receptors in recognition systems , 1963, IEEE Trans. Inf. Theory.

[2]  E. Forgy Cluster analysis of multivariate data : efficiency versus interpretability of classifications , 1965 .

[3]  J. Wolfe PATTERN CLUSTERING BY MULTIVARIATE MIXTURE ANALYSIS. , 1970, Multivariate behavioral research.

[4]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[5]  H. Akaike A new look at the statistical model identification , 1974 .

[6]  Keinosuke Fukunaga,et al.  A Branch and Bound Algorithm for Feature Subset Selection , 1977, IEEE Transactions on Computers.

[7]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[8]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[9]  J. Kittler,et al.  Feature Set Search Alborithms , 1978 .

[10]  G. W. Milligan,et al.  A monte carlo study of thirty internal criterion measures for cluster analysis , 1981 .

[11]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[12]  Wei-Chien Chang On using Principal Components before Separating a Mixture of Two Multivariate Normal Distributions , 1983 .

[13]  New York Dover,et al.  ON THE CONVERGENCE PROPERTIES OF THE EM ALGORITHM , 1983 .

[14]  J. Rissanen A UNIVERSAL PRIOR FOR INTEGERS AND ESTIMATION BY MINIMUM DESCRIPTION LENGTH , 1983 .

[15]  J. Hartigan Statistical theory in clustering , 1985 .

[16]  A. F. Smith,et al.  Statistical analysis of finite mixture distributions , 1986 .

[17]  Robin Sibson,et al.  What is projection pursuit , 1987 .

[18]  Geoffrey J. McLachlan,et al.  Mixture models : inference and applications to clustering , 1989 .

[19]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[20]  Pat Langley,et al.  Models of Incremental Concept Formation , 1990, Artif. Intell..

[21]  Thomas G. Dietterich,et al.  Learning with Many Irrelevant Features , 1991, AAAI.

[22]  Larry A. Rendell,et al.  A Practical Approach to Feature Selection , 1992, ML.

[23]  Justin Doak,et al.  An evaluation of feature selection methods and their application to computer security , 1992 .

[24]  Claire Cardie,et al.  Using Decision Trees to Improve Case-Based Learning , 1993, ICML.

[25]  Igor Kononenko,et al.  Estimating Attributes: Analysis and Extensions of RELIEF , 1994, ECML.

[26]  David W. Aha,et al.  Feature Selection for Case-Based Classification of Cloud Types: An Empirical Comparison , 1994 .

[27]  Pat Langley,et al.  Oblivious Decision Trees and Abstract Cases , 1994 .

[28]  Ron Kohavi,et al.  Irrelevant Features and the Subset Selection Problem , 1994, ICML.

[29]  Rich Caruana,et al.  Greedy Attribute Selection , 1994, ICML.

[30]  Gregory M. Provan,et al.  A Comparison of Induction Algorithms for Selective and non-Selective Bayesian Classifiers , 1995, ICML.

[31]  Michael J. Pazzani,et al.  Searching for Dependencies in Bayesian Classifiers , 1995, AISTATS.

[32]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[33]  S. Klinke,et al.  Exploratory Projection Pursuit , 1995 .

[34]  Geoffrey E. Hinton,et al.  The EM algorithm for mixtures of factor analyzers , 1996 .

[35]  Brian D. Ripley,et al.  Pattern Recognition and Neural Networks , 1996 .

[36]  Peter C. Cheeseman,et al.  Bayesian Classification (AutoClass): Theory and Results , 1996, Advances in Knowledge Discovery and Data Mining.

[37]  Huan Liu,et al.  Dimensionality reduction via discretization , 1996, Knowl. Based Syst..

[38]  G. McLachlan,et al.  The EM algorithm and extensions , 1996 .

[39]  Douglas H. Fisher,et al.  Iterative Optimization and Simplification of Hierarchical Clusterings , 1996, J. Artif. Intell. Res..

[40]  Padhraic Smyth,et al.  Clustering Using Monte Carlo Cross-Validation , 1996, KDD.

[41]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[42]  Ashwin Ram,et al.  Efficient Feature Selection in Conceptual Clustering , 1997, ICML.

[43]  Nanda Kambhatla,et al.  Dimension Reduction by Local Principal Component Analysis , 1997, Neural Computation.

[44]  Paul S. Bradley,et al.  Refining Initial Points for K-Means Clustering , 1998, ICML.

[45]  Paul S. Bradley,et al.  Initialization of Iterative Refinement Clustering Algorithms , 1998, KDD.

[46]  Dimitrios Gunopulos,et al.  Automatic subspace clustering of high dimensional data for data mining applications , 1998, SIGMOD '98.

[47]  Ayoub Ghriss,et al.  Mixtures of Probabilistic Principal Component Analysers , 2018 .

[48]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[49]  Christopher M. Bishop,et al.  Mixtures of Probabilistic Principal Component Analyzers , 1999, Neural Computation.

[50]  Aapo Hyvärinen,et al.  Survey on Independent Component Analysis , 1999 .

[51]  Carla E. Brodley,et al.  The customized-queries approach to CBIR using EM , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[52]  Shivakumar Vaithyanathan,et al.  Model Selection in Unsupervised Learning with Applications To Document Clustering , 1999, International Conference on Machine Learning.

[53]  Luis Talavera,et al.  Feature Selection as a Preprocessing Step for Hierarchical Clustering , 1999, ICML.

[54]  Zoubin Ghahramani,et al.  Variational Inference for Bayesian Mixtures of Factor Analysers , 1999, NIPS.

[55]  Carla E. Brodley,et al.  Feature Subset Selection and Order Identification for Unsupervised Learning , 2000, ICML.

[56]  Carla E. Brodley,et al.  Visualization and interactive feature selection for unsupervised data , 2000, KDD '00.

[57]  Geoffrey E. Hinton,et al.  SMEM Algorithm for Mixture Models , 1998, Neural Computation.

[58]  Andrew W. Moore,et al.  X-means: Extending K-means with Efficient Estimation of the Number of Clusters , 2000, ICML.

[59]  Shivakumar Vaithyanathan,et al.  Hierarchical Unsupervised Learning , 2000, International Conference on Machine Learning.

[60]  Andrew R. Webb,et al.  Statistical Pattern Recognition, Second Edition , 2002 .

[61]  Anil K. Jain,et al.  Feature Selection in Mixture-Based Clustering , 2002, NIPS.

[62]  Adrian E. Raftery,et al.  Model-Based Clustering, Discriminant Analysis, and Density Estimation , 2002 .

[63]  J. Friedman Clustering objects on subsets of attributes , 2002 .

[64]  Anil K. Jain,et al.  Unsupervised Learning of Finite Mixture Models , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[65]  Filippo Menczer,et al.  Evolutionary model selection in unsupervised learning , 2002, Intell. Data Anal..

[66]  Carla E. Brodley,et al.  Unsupervised Feature Selection Applied to Content-Based Retrieval of Lung Images , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[67]  J. Friedman,et al.  Clustering objects on subsets of attributes (with discussion) , 2004 .

[68]  Boris G. Mirkin,et al.  Concept Learning and Feature Selection Based on Square-Error Clustering , 1999, Machine Learning.

[69]  Douglas H. Fisher,et al.  Knowledge Acquisition Via Incremental Conceptual Clustering , 1987, Machine Learning.

[70]  Charles E. Heckler,et al.  Applied Multivariate Statistical Analysis , 2005, Technometrics.