Using upper bounds on attainable discrimination to select discrete valued features

Selection of features that will permit accurate pattern classification is, in general, a difficult task. However, if a particular data set is represented by discrete valued features, it becomes possible to determine empirically the contribution that each feature makes to the discrimination between classes. We describe how to calculate the maximum discrimination possible in a two alternative forced choice decision problem, when discrete valued features are used to represent a given data set. (In this paper, we measure discrimination in terms of the area under the receiver operating characteristic (ROC) curve.) Since this bound corresponds to the upper limit of classification achievable by any classifier (with that given data representation), we can use it to assess whether recognition errors are due to a lack of separability in the data or shortcomings in the classification technique. In comparison to the training and testing of artificial neural networks, the empirical bound on discrimination can be efficiently calculated, allowing an experimenter to decide whether subsequent development of neural network models is warranted. We extend the discrimination bound method so that we can estimate both the maximum and average discrimination we can expect on unseen test data. These estimation techniques are the basis of a backwards elimination algorithm that can be used to rank features in order of their discriminative power. We use two problems to demonstrate this feature selection process: classification of the Mushroom Database, and a real-world, pregnancy related medical risk prediction task-assessment of risk of perinatal death.