Selection of features that will permit accurate pattern classification is, in general, a difficult task. However, if a particular data set is represented by discrete valued features, it becomes possible to determine empirically the contribution that each feature makes to the discrimination between classes. We describe how to calculate the maximum discrimination possible in a two alternative forced choice decision problem, when discrete valued features are used to represent a given data set. (In this paper, we measure discrimination in terms of the area under the receiver operating characteristic (ROC) curve.) Since this bound corresponds to the upper limit of classification achievable by any classifier (with that given data representation), we can use it to assess whether recognition errors are due to a lack of separability in the data or shortcomings in the classification technique. In comparison to the training and testing of artificial neural networks, the empirical bound on discrimination can be efficiently calculated, allowing an experimenter to decide whether subsequent development of neural network models is warranted. We extend the discrimination bound method so that we can estimate both the maximum and average discrimination we can expect on unseen test data. These estimation techniques are the basis of a backwards elimination algorithm that can be used to rank features in order of their discriminative power. We use two problems to demonstrate this feature selection process: classification of the Mushroom Database, and a real-world, pregnancy related medical risk prediction task-assessment of risk of perinatal death.
[1]
D. M. Green,et al.
Signal detection theory and psychophysics
,
1966
.
[2]
Josef Kittler,et al.
Mathematics Methods of Feature Selection in Pattern Recognition
,
1975,
Int. J. Man Mach. Stud..
[3]
Jack Sklansky,et al.
On Automatic Feature Selection
,
1988,
Int. J. Pattern Recognit. Artif. Intell..
[4]
R. W. Prager,et al.
Limits on the Discrimination Possible with Discrete Valued Data, with Application to Medical Risk Prediction Limits on the Discrimination Possible with Discrete Valued Data, with Application to Medical Risk Prediction
,
1996
.
[5]
D. Bamber.
The area above the ordinal dominance graph and the area below the receiver operating characteristic graph
,
1975
.
[6]
Dick E. Boekee,et al.
Some aspects of error bounds in feature selection
,
1979,
Pattern Recognit..
[7]
H. B. Mann,et al.
On a Test of Whether one of Two Random Variables is Stochastically Larger than the Other
,
1947
.
[8]
Leo Breiman,et al.
Classification and Regression Trees
,
1984
.
[9]
J A Hanley,et al.
Maximum attainable discrimination and the utilization of radiologic examinations.
,
1982,
Journal of chronic diseases.