Feature selection using expected attainable discrimination

We propose expected attainable discrimination (EAD) as a measure to select discrete valued features for reliable discrimination between two classes of data. EAD is an average of the area under the ROC curves obtained when a simple histogram probability density model is trained and tested on many random partitions of a data set. EAD can be incorporated into various stepwise search methods to determine promising subsets of features, particularly when misclassification costs are difficult or impossible to specify. Experimental application to the problem of risk prediction in pregnancy is described.

[1]  D. Bamber The area above the ordinal dominance graph and the area below the receiver operating characteristic graph , 1975 .

[2]  Kevin J. Dalton,et al.  Using upper bounds on attainable discrimination to select discrete valued features , 1996, Neural Networks for Signal Processing VI. Proceedings of the 1996 IEEE Signal Processing Society Workshop.

[3]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[4]  D. M. Green,et al.  Signal detection theory and psychophysics , 1966 .

[5]  David Lovell,et al.  On the use of Expected Attainable Discrimination for feature selection in large scale medical risk p , 1997 .

[6]  Chi Hau Chen,et al.  Pattern recognition and signal processing , 1978 .

[7]  Jan M. Van Campenhout 36 Topics in measurement selection , 1982, Classification, Pattern Recognition and Reduction of Dimensionality.

[8]  J A Hanley,et al.  Maximum attainable discrimination and the utilization of radiologic examinations. , 1982, Journal of chronic diseases.

[9]  David J. C. MacKay,et al.  A hierarchical Dirichlet language model , 1995, Natural Language Engineering.

[10]  Josef Kittler,et al.  Mathematics Methods of Feature Selection in Pattern Recognition , 1975, Int. J. Man Mach. Stud..

[11]  K. Kupka,et al.  International classification of diseases: ninth revision. , 1978, WHO chronicle.

[12]  Sholom M. Weiss,et al.  Computer Systems That Learn , 1990 .

[13]  David W. Hosmer,et al.  Applied Logistic Regression , 1991 .

[14]  Jack Sklansky,et al.  On Automatic Feature Selection , 1988, Int. J. Pattern Recognit. Artif. Intell..

[15]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[16]  Bart Kosko,et al.  Neural networks for signal processing , 1992 .

[17]  M. Hills,et al.  Discrimination and Allocation with Discrete Data , 1967 .

[18]  Peter E. Hart,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[19]  Casimir A. Kulikowski,et al.  Computer Systems That Learn: Classification and Prediction Methods from Statistics, Neural Nets, Machine Learning and Expert Systems , 1990 .

[20]  C. Metz,et al.  A receiver operating characteristic partial area index for highly sensitive diagnostic tests. , 1996, Radiology.

[21]  S. Siegel,et al.  Nonparametric Statistics for the Behavioral Sciences , 2022, The SAGE Encyclopedia of Research Design.