Adaptive Feature Spaces For Land Cover Classification With Limited Ground Truth Data

Classification of land cover based on hyperspectral data is very challenging because typically tens of classes with uneven priors are involved, the inputs are high dimensional, and there is often scarcity of labeled data. Several researchers have observed that it is often preferable to decompose a multiclass problem into multiple two-class problems, solve each such subproblem using a suitable binary classifier, and then combine the outputs of this collection of classifiers in a suitable manner to obtain the answer to the original multiclass problem. This approach is taken by the popular error correcting output codes (ECOC) technique, as well by the binary hierarchical classifier (BHC). Classical techniques for dealing with small sample sizes include regularization of covariance matrices and feature reduction. In this paper we address the twin problems of small sample sizes and multiclass settings by proposing a feature reduction scheme that adaptively adjusts to the amount of labeled data available. This scheme can be used in conjunction with ECOC and the BHC, as well as other approaches such as round-robin classification that decompose a multiclass problem into a number of two (meta)-class problems. In particular, we develop the best-basis binary hierarchical classifier (BB-BHC) and best basis ECOC (BB-ECOC) families of models that are adapted to "small sample size" situations. Currently, there are few studies that compare the efficacy of different approaches to multiclass problems in general settings as well as in the specific context of small sample sizes. Our experiments on two sets of remote sensing data show that both BB-BHC and BB-ECOC methods are superior to their nonadaptive versions when faced with limited data, with the BB-BHC showing a slight edge in terms of classification accuracy as well as interpretability.

[1]  Chih-Jen Lin,et al.  A comparison of methods for multiclass support vector machines , 2002, IEEE Trans. Neural Networks.

[2]  J. Friedman Regularized Discriminant Analysis , 1989 .

[3]  Reza Ghaderi,et al.  Binary labelling and decision-level fusion , 2001, Inf. Fusion.

[4]  Anil K. Jain,et al.  Statistical Pattern Recognition: A Review , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[5]  Amanda J. C. Sharkey,et al.  On Combining Artificial Neural Nets , 1996, Connect. Sci..

[6]  David A. Landgrebe,et al.  Robust parameter estimation for mixture model , 2000, IEEE Trans. Geosci. Remote. Sens..

[7]  Kishan G. Mehrotra,et al.  Efficient classification for multiclass problems using modular neural networks , 1995, IEEE Trans. Neural Networks.

[8]  Josef Kittler,et al.  Pattern Recognition Theory and Applications , 1987, NATO ASI Series.

[9]  Joydeep Ghosh,et al.  A Hierarchical Multiclassifier System for Hyperspectral Data Analysis , 2000, Multiple Classifier Systems.

[10]  David A. Landgrebe,et al.  The effect of unlabeled samples in reducing the small sample size problem and mitigating the Hughes phenomenon , 1994, IEEE Trans. Geosci. Remote. Sens..

[11]  Etienne Barnard,et al.  Backpropagation uses prior information efficiently , 1993, IEEE Trans. Neural Networks.

[12]  David A. Landgrebe,et al.  Partially supervised classification using weighted unsupervised clustering , 1999, IEEE Trans. Geosci. Remote. Sens..

[13]  Joydeep Ghosh,et al.  Adaptive Feature Spaces For Land Cover Classification With Limited Ground Truth Data , 2004, Int. J. Pattern Recognit. Artif. Intell..

[14]  Joydeep Ghosh,et al.  Best-bases feature extraction algorithms for classification of hyperspectral data , 2001, IEEE Trans. Geosci. Remote. Sens..

[15]  Anil K. Jain,et al.  Small Sample Size Effects in Statistical Pattern Recognition: Recommendations for Practitioners , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[16]  Qiong Jackson,et al.  An adaptive classifier design for high-dimensional data analysis with a limited training data set , 2001, IEEE Trans. Geosci. Remote. Sens..

[17]  Joydeep Ghosh,et al.  Hierarchical Fusion of Multiple Classifiers for Hyperspectral Data Analysis , 2002, Pattern Analysis & Applications.

[18]  David A. Landgrebe,et al.  Hyperspectral image data analysis , 2002, IEEE Signal Process. Mag..

[19]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[20]  Kagan Tumer,et al.  Error Correlation and Error Reduction in Ensemble Classifiers , 1996, Connect. Sci..

[21]  David A. Landgrebe,et al.  Decision boundary feature extraction for neural networks , 1997, IEEE Trans. Neural Networks.

[22]  Johannes Fürnkranz,et al.  Round Robin Classification , 2002, J. Mach. Learn. Res..

[23]  Robert P. W. Duin,et al.  Expected classification error of the Fisher linear classifier with pseudo-inverse covariance matrix , 1998, Pattern Recognit. Lett..

[24]  Richard Lippmann,et al.  Neural Network Classifiers Estimate Bayesian a posteriori Probabilities , 1991, Neural Computation.

[25]  David A. Landgrebe,et al.  Covariance estimation with limited training samples , 1999, IEEE Trans. Geosci. Remote. Sens..

[26]  David A. Landgrebe,et al.  Feature Extraction Based on Decision Boundaries , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[27]  Thomas G. Dietterich,et al.  Solving Multiclass Learning Problems via Error-Correcting Output Codes , 1994, J. Artif. Intell. Res..

[28]  Joydeep Ghosh,et al.  The Rapid Kernel Classifier: A Link between the Self-Organizing Feature Map and the Radial Basis Function Network , 1994 .