Comparison of Neural and Statistical Classifiers - Theory and Practice

Pattern classi cation using neural networks and statistical methods is discussed. We rst give a tutorial overview that groups popular classi ers according to their underlying mathematical principles into several distinct categories. Starting from the Bayes classi er, one division is whether the classi er is explicitly estimating class conditional densities, or directly estimating the posterior probabilities by regression. Another criterion is the exibility of the architecture in the sense of how rich the discriminant function family is. Still one dimension is neural vs. nonneural learning: neural learning is characterized by simple local computations in a number of real or virtual processing elements. Based on these comparisons, a number of classi cation methods were selected for a case study that uses handwritten digit data. An e ort was made to get fair estimates of their true classi cation performance, thus training set cross-validation was extensively used to design the various classi ers. The classi cation errors were estimated with an independent testing set. The performance of a number of most typical neural and statistical classi ers was compared. Also, four methods of our own were used in the comparisons: the Reduced Kernel Discriminant Analysis (RKDA), the Learning kNearest Neighbor Classi er, the Averaged Learning Subspace Method (ALSM), and a modi ed version of Kernel Discriminant Analysis. Also, committee classi ers and classi cation with rejection were considered. In these experiments, the Local Linear Regression (LLR) method, although computationally prohibitively heavy, was the best classi er from the point of view of classi cation accuracy, with the Averaged Learning Subspace Method (ALSM) following close behind. For methods having both a learning and a non-learning version, error correcting learning seemed to give an advantage.

[1]  David J. Hand,et al.  Kernel Discriminant Analysis , 1983 .

[2]  Peter E. Hart,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[3]  Ching Y. Suen,et al.  Complementary algorithms for the recognition of totally unconstrained handwritten numerals , 1990, [1990] Proceedings. 10th International Conference on Pattern Recognition.

[4]  Matthew P. Wand,et al.  Kernel Smoothing , 1995 .

[5]  Robert A. Jacobs,et al.  Hierarchical Mixtures of Experts and the EM Algorithm , 1993, Neural Computation.

[6]  Erkki Oja,et al.  Subspace methods of pattern recognition , 1983 .

[7]  Donald F. Specht,et al.  Probabilistic neural networks , 1990, Neural Networks.

[8]  M. Stone Cross‐Validatory Choice and Assessment of Statistical Predictions , 1976 .

[9]  David H. Wolpert,et al.  Stacked generalization , 1992, Neural Networks.

[10]  Vincent Kanade,et al.  Clustering Algorithms , 2021, Wireless RF Energy Transfer in the Massive IoT Era.

[11]  W. Cleveland,et al.  Smoothing by Local Regression: Principles and Methods , 1996 .

[12]  R. Tibshirani,et al.  Discriminant Analysis by Gaussian Mixtures , 1996 .

[13]  C. D. Kemp,et al.  Density Estimation for Statistics and Data Analysis , 1987 .

[14]  K. Fukunaga,et al.  Nonparametric Data Reduction , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Ching Y. Suen,et al.  Building a new generation of handwriting recognition systems , 1993, Pattern Recognit. Lett..

[16]  J. Mantas,et al.  An overview of character recognition methodologies , 1986, Pattern Recognit..

[17]  L. Holmström,et al.  A new multivariate technique for top quark search , 1995 .

[18]  R. Redner,et al.  Mixture densities, maximum likelihood, and the EM algorithm , 1984 .

[19]  Erkki Oja,et al.  Self - Organizing Maps and Computer Vision , 1992 .

[20]  Anders Krogh,et al.  Neural Network Ensembles, Cross Validation, and Active Learning , 1994, NIPS.

[21]  David J. Spiegelhalter,et al.  Machine Learning, Neural and Statistical Classification , 2009 .

[22]  S. Impedovo,et al.  Optical Character Recognition - a Survey , 1991, Int. J. Pattern Recognit. Artif. Intell..

[23]  Lasse Holmström,et al.  The self-organizing reduced kernel density estimator , 1993, IEEE International Conference on Neural Networks.

[24]  Sargur N. Srihari,et al.  Regression approach to combination of decisions by multiple character recognition algorithms , 1992, Electronic Imaging.

[25]  Sargur N. Srihari,et al.  Bayesian and neural network pattern recognition: a theoretical connection and empirical results with handwritten characters , 1991 .

[26]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[27]  Frank Rosenblatt,et al.  PRINCIPLES OF NEURODYNAMICS. PERCEPTRONS AND THE THEORY OF BRAIN MECHANISMS , 1963 .

[28]  J. Morgan,et al.  Problems in the Analysis of Survey Data, and a Proposal , 1963 .

[29]  Richard G. Priest,et al.  Pattern classification using projection pursuit , 1990, Pattern Recognit..

[30]  Peter Craven,et al.  Smoothing noisy data with spline functions , 1978 .

[31]  Donald F. Specht,et al.  A general regression neural network , 1991, IEEE Trans. Neural Networks.

[32]  Ching Y. Suen,et al.  Optimal combinations of pattern classifiers , 1995, Pattern Recognition Letters.

[33]  Teuvo Kohonen,et al.  The 'neural' phonetic typewriter , 1988, Computer.

[34]  Maurice Milgram,et al.  Transformation Invariant Autoassociation with Application to Handwritten Character Recognition , 1994, NIPS.

[35]  Daryl Pregibon,et al.  Tree-based models , 1992 .

[36]  D. M. Titterington,et al.  Neural Networks: A Review from a Statistical Perspective , 1994 .

[37]  Lawrence D. Jackel,et al.  Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.

[38]  O. Firschein,et al.  Syntactic pattern recognition and applications , 1983, Proceedings of the IEEE.

[39]  R. Tibshirani,et al.  Flexible Discriminant Analysis by Optimal Scoring , 1994 .

[40]  Chorkin Chan,et al.  A Three-Layer Adaptive Network for Pattern Density Estimation and Classification , 1991, Int. J. Neural Syst..

[41]  Michel Gilloux Research into the new generation of character and mailing address recognition systems at the French post office research center , 1993, Pattern Recognit. Lett..

[42]  Yizhak Idan,et al.  Pattern recognition by cooperating neural networks , 1992, Optics & Photonics.

[43]  S. Hyakin,et al.  Neural Networks: A Comprehensive Foundation , 1994 .

[44]  David J. Marchette,et al.  Adaptive mixture density estimation , 1993, Pattern Recognit..

[45]  Harris Drucker,et al.  Boosting Performance in Neural Networks , 1993, Int. J. Pattern Recognit. Artif. Intell..

[46]  Josef Kittler,et al.  Pattern recognition : a statistical approach , 1982 .

[47]  Harris Drucker,et al.  Boosting and Other Ensemble Methods , 1994, Neural Computation.

[48]  J. L. Hodges,et al.  Discriminatory Analysis - Nonparametric Discrimination: Consistency Properties , 1989 .

[49]  Michael Perrone,et al.  Putting It All Together: Methods for Combining Neural Networks , 1993, NIPS.

[50]  Padhraic Smyth,et al.  Fault Diagnosis of Antenna Pointing Systems Using Hybrid Neural Network and Signal Processing Models , 1991, NIPS.

[51]  Geoffrey E. Hinton,et al.  An Alternative Model for Mixtures of Experts , 1994, NIPS.

[52]  W. Cleveland,et al.  Locally Weighted Regression: An Approach to Regression Analysis by Local Fitting , 1988 .

[53]  John A. Hertz,et al.  Exploiting Neurons with Localized Receptive Fields to Learn Chaos , 1990, Complex Syst..

[54]  G. McLachlan Discriminant Analysis and Statistical Pattern Recognition , 1992 .

[55]  Geoffrey E. Hinton,et al.  Recognizing Handwritten Digits Using Mixtures of Linear Models , 1994, NIPS.

[56]  F ROSENBLATT,et al.  The perceptron: a probabilistic model for information storage and organization in the brain. , 1958, Psychological review.

[57]  David S. Touretzky,et al.  Learning with Ensembles: How Over--tting Can Be Useful , 1996 .

[58]  J. Freidman,et al.  Multivariate adaptive regression splines , 1991 .

[59]  Isabelle Guyon Applications of Neural Networks to Character Recognition , 1991, Int. J. Pattern Recognit. Artif. Intell..

[60]  Robert A. Jacobs,et al.  Methods For Combining Experts' Probability Assessments , 1995, Neural Computation.

[61]  Henry S. Baird,et al.  Recognition technology frontiers , 1993, Pattern Recognit. Lett..

[62]  Halbert White,et al.  Learning in Artificial Neural Networks: A Statistical Perspective , 1989, Neural Computation.

[63]  D. W. Scott,et al.  Multivariate Density Estimation, Theory, Practice and Visualization , 1992 .

[64]  Adam Krzyżak,et al.  Methods of combining multiple classifiers and their applications to handwriting recognition , 1992, IEEE Trans. Syst. Man Cybern..

[65]  M. C. Jones,et al.  E. Fix and J.L. Hodges (1951): An Important Contribution to Nonparametric Discriminant Analysis and Density Estimation: Commentary on Fix and Hodges (1951) , 1989 .

[66]  Fumitaka Kimura,et al.  Handwritten numerical recognition based on multiple algorithms , 1991, Pattern Recognit..

[67]  J. Friedman Regularized Discriminant Analysis , 1989 .

[68]  Kunihiko Fukushima,et al.  Neocognitron: A hierarchical neural network capable of visual pattern recognition , 1988, Neural Networks.

[69]  John S. Bridle,et al.  Training Stochastic Model Recognition Algorithms as Networks can Lead to Maximum Mutual Information Estimation of Parameters , 1989, NIPS.

[70]  G. Tutz An alternative choice of smoothing for kernel-based density estimates in discrete discriminant analysis , 1986 .

[71]  L. Cooper,et al.  When Networks Disagree: Ensemble Methods for Hybrid Neural Networks , 1992 .

[72]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[73]  Terrence J. Sejnowski,et al.  Parallel Networks that Learn to Pronounce English Text , 1987, Complex Syst..

[74]  T. Hastie,et al.  Local Regression: Automatic Kernel Carpentry , 1993 .

[75]  Volker Tresp,et al.  Combining Estimators Using Non-Constant Weighting Functions , 1994, NIPS.

[76]  Trevor Hastie,et al.  Statistical Models in S , 1991 .

[77]  W. Highleyman Linear Decision Functions, with Application to Pattern Recognition , 1962, Proceedings of the IRE.

[78]  David J. Marchette,et al.  Adaptive mixtures: Recursive nonparametric pattern recognition , 1991, Pattern Recognit..

[79]  Ching Y. Suen,et al.  A theoretical analysis of the application of majority voting to pattern recognition , 1994, Proceedings of the 12th IAPR International Conference on Pattern Recognition, Vol. 3 - Conference C: Signal Processing (Cat. No.94CH3440-5).

[80]  Rama Chellappa,et al.  Evaluation of pattern classifiers for fingerprint and OCR applications , 1994, Pattern Recognit..

[81]  Richard Lippmann,et al.  Neural Network Classifiers Estimate Bayesian a posteriori Probabilities , 1991, Neural Computation.

[82]  M. Berthod,et al.  Automatic recognition of handprinted characters—The state of the art , 1980, Proceedings of the IEEE.

[83]  Jouko Lampinen,et al.  Distortion tolerant pattern recognition based on self-organizing feature extraction , 1995, IEEE Trans. Neural Networks.

[84]  M. Garris NIST form-based handprint recognition system , 1994 .

[85]  Sargur N. Srihari,et al.  Recognition of handwritten and machine-printed text for postal address interpretation , 1993, Pattern Recognit. Lett..

[86]  P. J. Green,et al.  Density Estimation for Statistics and Data Analysis , 1987 .

[87]  Ching Y. Suen,et al.  Historical review of OCR research and development , 1992, Proc. IEEE.

[88]  Petri Koistinen,et al.  Kernel regression and backpropagation training with noise , 1991, [Proceedings] 1991 IEEE International Joint Conference on Neural Networks.

[89]  David W. Scott The New S Language , 1990 .

[90]  C. W. Therrien,et al.  Decision, Estimation and Classification: An Introduction to Pattern Recognition and Related Topics , 1989 .

[91]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[92]  Robert J. Schalkoff,et al.  Pattern recognition - statistical, structural and neural approaches , 1991 .

[93]  J. Friedman,et al.  Projection Pursuit Regression , 1981 .

[94]  Sargur N. Srihari,et al.  Decision Combination in Multiple Classifier Systems , 1994, IEEE Trans. Pattern Anal. Mach. Intell..

[95]  C.Y. Suen,et al.  Associative switch for combining multiple classifiers , 1991, IJCNN-91-Seattle International Joint Conference on Neural Networks.

[96]  W. Pitts,et al.  A Logical Calculus of the Ideas Immanent in Nervous Activity (1943) , 2021, Ideas That Created the Future.