Knowing what doesn't Matter: Exploiting the Omission of Irrelevant Data

Abstract Most learning algorithms work most effectively when their training data contain completely specified labeled samples. In many diagnostic tasks, however, the data will include the values of only some of the attributes; we model this as a blocking process that hides the values of those attributes from the learner. While blockers that remove the values of critical attributes can handicap a learner, this paper instead focuses on blockers that remove only conditionally irrelevant attribute values, i.e. values that are not needed to classify an instance , given the values of the other unblocked attributes. We first motivate and formalize this model of “superfluous-value blocking”, and then demonstrate that these omissions can be useful, by proving that certain classes that seem hard to learn in the general PAC model—viz., decision trees and DNF formulae—are trivial to learn in this setting. We then extend this model to deal with (1) theory revision (i.e. modifying an existing formula); (2) blockers that occasionally include superfluous values or exclude required values; and (3) other corruptions of the training data.

[1]  Thomas G. Dietterich,et al.  Readings in Machine Learning , 1991 .

[2]  Alexander Kogan,et al.  Exploiting the Omission of Irrelevant Data , 1996, ICML.

[3]  George Shackelford,et al.  Learning k-DNF with noise in the attributes , 1988, Annual Conference Computational Learning Theory.

[4]  Stephen Muggleton,et al.  Machine Invention of First Order Predicates by Inverting Resolution , 1988, ML.

[5]  Eyal Kushilevitz,et al.  On learning visual concepts and DNF formulae , 1993, COLT '93.

[6]  Ron Kohavi,et al.  Irrelevant Features and the Subset Selection Problem , 1994, ICML.

[7]  Sturart J. Russell,et al.  The use of knowledge in analogy and induction , 1989 .

[8]  N. Littlestone Learning Quickly When Irrelevant Attributes Abound: A New Linear-Threshold Algorithm , 1987, 28th Annual Symposium on Foundations of Computer Science (sfcs 1987).

[9]  R. Schapire Toward Eecient Agnostic Learning , 1992 .

[10]  David Haussler,et al.  Equivalence of models for polynomial learnability , 1988, COLT '88.

[11]  Peter D. Turney Cost-Sensitive Classification: Empirical Evaluation of a Hybrid Genetic Decision Tree Induction Algorithm , 1994, J. Artif. Intell. Res..

[12]  H. Chernoff A Measure of Asymptotic Efficiency for Tests of a Hypothesis Based on the sum of Observations , 1952 .

[13]  Ronald L. Rivest,et al.  Learning decision lists , 2004, Machine Learning.

[14]  Raymond J. Mooney,et al.  Changing the Rules: A Comprehensive Approach to Theory Refinement , 1990, AAAI.

[15]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[16]  Leslie G. Valiant,et al.  On the learnability of Boolean formulae , 1987, STOC.

[17]  David Haussler,et al.  Learning decision trees from random examples , 1988, COLT '88.

[18]  Geoffrey G. Towell,et al.  Symbolic knowledge and neural networks: insertion, refinement and extraction , 1992 .

[19]  Nick Littlestone,et al.  Redundant noisy attributes, attribute errors, and linear-threshold learning using winnow , 1991, COLT '91.

[20]  Sally A. Goldman,et al.  Teaching a Smarter Learner , 1996, J. Comput. Syst. Sci..

[21]  Michael J. Pazzani,et al.  A Methodology for Evaluating Theory Revision Systems: Results with Audrey II , 1993, IJCAI.

[22]  Sridhar Mahadevan,et al.  Quantifying prior determination knowledge using the PAC learning model , 2004, Machine Learning.

[23]  R. Mooney A Preliminary PAC Analysis of Theory Revision , 1995 .

[24]  Russell Greiner The Complexity of Theory Revision , 1995, IJCAI.

[25]  Dale Schuurmans,et al.  Learning Default Concepts , 1994 .

[26]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, STOC '84.

[27]  Eyal Kushilevitz,et al.  On learning visual concepts and DNF formulae , 1993, COLT '93.

[28]  Ming Li,et al.  Learning in the presence of malicious errors , 1993, STOC '88.

[29]  Dana Angluin,et al.  Computational learning theory: survey and selected bibliography , 1992, STOC '92.

[30]  Ray Bareiss,et al.  Concept Learning and Heuristic Classification in WeakTtheory Domains , 1990, Artif. Intell..

[31]  Lisa Hellerstein,et al.  Learning in the presence of finitely or infinitely many irrelevant attributes , 1991, COLT '91.

[32]  Dale Schuurmans,et al.  Learning to classify incomplete examples , 1997, COLT 1997.