Comparison of machine learning and traditional classifiers in glaucoma diagnosis

Glaucoma is a progressive optic neuropathy with characteristic structural changes in the optic nerve head reflected in the visual field. The visual-field sensitivity test is commonly used in a clinical setting to evaluate glaucoma. Standard automated perimetry (SAP) is a common computerized visual-field test whose output is amenable to machine learning. We compared the performance of a number of machine learning algorithms with STATPAC indexes mean deviation, pattern standard deviation, and corrected pattern standard deviation. The machine learning algorithms studied included multilayer perceptron (MLP), support vector machine (SVM), and linear (LDA) and quadratic discriminant analysis (QDA), Parzen window, mixture of Gaussian (MOG), and mixture of generalized Gaussian (MGG). MLP and SVM are classifiers that work directly on the decision boundary and fall under the discriminative paradigm. Generative classifiers, which first model the data probability density and then perform classification via Bayes' rule, usually give deeper insight into the structure of the data space. We have applied MOG, MGG, LDA, QDA, and Parzen window to the classification of glaucoma from SAP. Performance of the various classifiers was compared by the areas under their receiver operating characteristic curves and by sensitivities (true-positive rates) at chosen specificities (true-negative rates). The machine-learning-type classifiers showed improved performance over the best indexes from STATPAC. Forward-selection and backward-elimination methodology further improved the classification rate and also has the potential to reduce testing time by diminishing the number of visual-field location measurements.

[1]  R. T. Cox The Algebra of Probable Inference , 1962 .

[2]  Thomas G. Dietterich,et al.  Readings in Machine Learning , 1991 .

[3]  Roy L. Streit,et al.  Maximum likelihood training of probabilistic neural networks , 1994, IEEE Trans. Neural Networks.

[4]  Gunnar Rätsch,et al.  Engineering Support Vector Machine Kerneis That Recognize Translation Initialion Sites , 2000, German Conference on Bioinformatics.

[5]  Nello Cristianini,et al.  Support vector machine classification and validation of cancer tissue samples using microarray expression data , 2000, Bioinform..

[6]  D. Broomhead,et al.  Radial Basis Functions, Multi-Variable Functional Interpolation and Adaptive Networks , 1988 .

[7]  S M Drance,et al.  The use of visual field indices in detecting changes in the visual field in glaucoma. , 1990, Investigative ophthalmology & visual science.

[8]  Pedro M. Domingos,et al.  Beyond Independence: Conditions for the Optimality of the Simple Bayesian Classifier , 1996, ICML.

[9]  Edward K. Blum,et al.  Approximation theory and feedforward networks , 1991, Neural Networks.

[10]  Wright-Patterson Afb,et al.  Feature Selection Using a Multilayer Perceptron , 1990 .

[11]  A. F. Smith,et al.  Statistical analysis of finite mixture distributions , 1986 .

[12]  Ron Kohavi,et al.  Irrelevant Features and the Subset Selection Problem , 1994, ICML.

[13]  M. Rosenblatt Remarks on Some Nonparametric Estimates of a Density Function , 1956 .

[14]  Paul S. Bradley,et al.  Feature Selection via Concave Minimization and Support Vector Machines , 1998, ICML.

[15]  Henry S. Baird,et al.  Recognition technology frontiers , 1993, Pattern Recognit. Lett..

[16]  C. Johnson,et al.  Screening for glaucomatous visual field loss with frequency-doubling perimetry. , 1997, Investigative ophthalmology & visual science.

[17]  E. Parzen On Estimation of a Probability Density Function and Mode , 1962 .

[18]  Zhang Huicheng Screening for glaucomatous visual field loss with frequency-doubling perimetry , 2002 .

[19]  Nello Cristianini,et al.  Large Margin DAGs for Multiclass Classification , 1999, NIPS.

[20]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[21]  David Haussler,et al.  Exploiting Generative Models in Discriminative Classifiers , 1998, NIPS.

[22]  Yung-Chang Chen,et al.  A new fast algorithm for effective training of neural classifiers , 1992, Pattern Recognit..

[23]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[24]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[25]  Robert W. Wilson,et al.  Regressions by Leaps and Bounds , 2000, Technometrics.

[26]  F ROSENBLATT,et al.  The perceptron: a probabilistic model for information storage and organization in the brain. , 1958, Psychological review.

[27]  Chris H. Q. Ding,et al.  Multi-class protein fold recognition using support vector machines and neural networks , 2001, Bioinform..

[28]  Chris A Johnson,et al.  Development of efficient threshold strategies for frequency doubling technology perimetry using computer simulation. , 2002, Investigative ophthalmology & visual science.

[29]  P A Sample,et al.  Visual function-specific perimetry for indirect comparison of different ganglion cell populations in glaucoma. , 2000, Investigative ophthalmology & visual science.

[30]  Alex Pentland,et al.  Maximum Conditional Likelihood via Bound Maximization and the CEM Algorithm , 1998, NIPS.

[31]  N. Swindale,et al.  Ability of the Heidelberg Retina Tomograph to Detect Early Glaucomatous Visual Field Loss , 1995, Journal of glaucoma.

[32]  H. Akaike A new look at the statistical model identification , 1974 .

[33]  Pat Langley,et al.  Selection of Relevant Features and Examples in Machine Learning , 1997, Artif. Intell..

[34]  S. Sathiya Keerthi,et al.  Improvements to Platt's SMO Algorithm for SVM Classifier Design , 2001, Neural Computation.

[35]  R. T. Cox,et al.  The Algebra of Probable Inference , 1962 .

[36]  John Platt,et al.  Probabilistic Outputs for Support vector Machines and Comparisons to Regularized Likelihood Methods , 1999 .

[37]  Tommi S. Jaakkola,et al.  Maximum Entropy Discrimination , 1999, NIPS.

[38]  David S. Broomhead,et al.  Multivariable Functional Interpolation and Adaptive Networks , 1988, Complex Syst..

[39]  Bernhard Schölkopf,et al.  Improving the accuracy and speed of support vector learning machines , 1997, NIPS 1997.

[40]  D. Goss The Ocular Examination: Measurements and Findings , 1997 .

[41]  Geoffrey J. McLachlan,et al.  Mixture models : inference and applications to clustering , 1989 .

[42]  Bernhard Schölkopf,et al.  Incorporating Invariances in Support Vector Learning Machines , 1996, ICANN.

[43]  Brian D. Ripley,et al.  Pattern Recognition and Neural Networks , 1996 .

[44]  Federico Girosi,et al.  Training support vector machines: an application to face detection , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[45]  P. Rousseeuw,et al.  Unmasking Multivariate Outliers and Leverage Points , 1990 .

[46]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[47]  Volker Roth,et al.  Nonlinear Discriminant Analysis Using Kernel Functions , 1999, NIPS.

[48]  E. DeLong,et al.  Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. , 1988, Biometrics.

[49]  Sayan Mukherjee,et al.  Feature Selection for SVMs , 2000, NIPS.

[50]  L B Lusted,et al.  Radiographic applications of receiver operating characteristic (ROC) curves. , 1974, Radiology.

[51]  Michael I. Jordan,et al.  On Discriminative vs. Generative Classifiers: A comparison of logistic regression and naive Bayes , 2001, NIPS.

[52]  D. E. Rumelhart,et al.  Learning internal representations by back-propagating errors , 1986 .

[53]  R. Hitchings,et al.  The optic disc in glaucoma II: correlation of the appearance of the optic disc with the visual field. , 1977, The British journal of ophthalmology.

[54]  L. Zangwill,et al.  Discriminating between normal and glaucomatous eyes using the Heidelberg Retina Tomograph, GDx Nerve Fiber Analyzer, and Optical Coherence Tomograph. , 2001, Archives of ophthalmology.

[55]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[56]  G. McLachlan Discriminant Analysis and Statistical Pattern Recognition , 1992 .

[57]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[58]  S. Hyakin,et al.  Neural Networks: A Comprehensive Foundation , 1994 .

[59]  George M. Furnival,et al.  Regressions by leaps and bounds , 2000 .

[60]  David G. Stork,et al.  Pattern Classification (2nd ed.) , 1999 .

[61]  Ashok Samal,et al.  Automatic recognition and analysis of human faces and facial expressions: a survey , 1992, Pattern Recognit..

[62]  Kenneth Levenberg A METHOD FOR THE SOLUTION OF CERTAIN NON – LINEAR PROBLEMS IN LEAST SQUARES , 1944 .

[63]  Susan T. Dumais,et al.  Inductive learning algorithms and representations for text categorization , 1998, CIKM '98.

[64]  Nir Friedman,et al.  Bayesian Network Classifiers , 1997, Machine Learning.

[65]  Christopher M. Bishop,et al.  Neural networks for pattern recognition , 1995 .

[66]  P A Sample,et al.  Color perimetry for assessment of primary open-angle glaucoma. , 1990, Investigative ophthalmology & visual science.

[67]  D. Cox,et al.  Analysis of Binary Data (2nd ed.). , 1990 .

[68]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[69]  Andrzej Cichocki,et al.  Flexible Independent Component Analysis , 2000, J. VLSI Signal Process..

[70]  A. F. Smith,et al.  Statistical analysis of finite mixture distributions , 1986 .

[71]  G. C. Tiao,et al.  Bayesian inference in statistical analysis , 1973 .

[72]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[73]  Geoffrey E. Hinton,et al.  Learning representations of back-propagation errors , 1986 .

[74]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[75]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[76]  Eric B. Baum,et al.  Supervised Learning of Probability Distributions by Neural Networks , 1987, NIPS.

[77]  David R. Cox The analysis of binary data , 1970 .

[78]  J. Hanley,et al.  The meaning and use of the area under a receiver operating characteristic (ROC) curve. , 1982, Radiology.