Sample Compression, Learnability, and the Vapnik-Chervonenkis Dimension

AbstractWithin the framework of pac-learning, we explore the learnability of concepts from samples using the paradigm of sample compression schemes. A sample compression scheme of size k for a concept class C $$\subseteq $$ 2X consists of a compression function and a reconstruction function. The compression function receives a finite sample set consistent with some concept in C and chooses a subset of k examples as the compression set. The reconstruction function forms a hypothesis on X from a compression set of k examples. For any sample set of a concept in C the compression set produced by the compression function must lead to a hypothesis consistent with the whole original sample set when it is fed to the reconstruction function. We demonstrate that the existence of a sample compression scheme of fixed-size for a class C is sufficient to ensure that the class C is pac-learnable.Previous work has shown that a class is pac-learnable if and only if the Vapnik-Chervonenkis (VC) dimension of the class is finite. In the second half of this paper we explore the relationship between sample compression schemes and the VC dimension. We define maximum and maximal classes of VC dimension d. For every maximum class of VC dimension d, there is a sample compression scheme of size d, and for sufficiently-large maximum classes there is no sample compression scheme of size less than d. We discuss briefly classes of VC dimension d that are maximal but not maximum. It is an open question whether every class of VC dimension d has a sample compression scheme of size O(d).

[1]  Journal of the Association for Computing Machinery , 1961, Nature.

[2]  Vladimir Vapnik,et al.  Chervonenkis: On the uniform convergence of relative frequencies of events to their probabilities , 1971 .

[3]  Norbert Sauer,et al.  On the Density of Families of Sets , 1972, J. Comb. Theory, Ser. A.

[4]  Tom M. Mitchell,et al.  Version Spaces: A Candidate Elimination Approach to Rule Learning , 1977, IJCAI.

[5]  Temple F. Smith Occam's razor , 1980, Nature.

[6]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, STOC '84.

[7]  J. Rissanen Stochastic Complexity and Modeling , 1986 .

[8]  David Haussler,et al.  Epsilon-nets and simplex range queries , 1986, SCG '86.

[9]  N. Littlestone Learning Quickly When Irrelevant Attributes Abound: A New Linear-Threshold Algorithm , 1987, 28th Annual Symposium on Foundations of Computer Science (sfcs 1987).

[10]  David Haussler,et al.  ɛ-nets and simplex range queries , 1987, Discret. Comput. Geom..

[11]  Leslie G. Valiant,et al.  Computational limitations on learning from examples , 1988, JACM.

[12]  Leslie G. Valiant,et al.  A general lower bound on the number of examples needed for learning , 1988, COLT '88.

[13]  David Haussler,et al.  Predicting {0,1}-functions on randomly drawn points , 1988, COLT '88.

[14]  Dana Angluin,et al.  Queries and concept learning , 1988, Machine Learning.

[15]  Manfred K. Warmuth,et al.  Learning Nested Differences of Intersection-Closed Concept Classes , 1989, COLT '89.

[16]  Sally Floyd,et al.  Space-bounded learning and the Vapnik-Chervonenkis dimension , 1989, COLT '89.

[17]  Ronald L. Rivest,et al.  Inferring Decision Trees Using the Minimum Description Length Principle , 1989, Inf. Comput..

[18]  Anselm Blumer,et al.  Learning faster than promised by the Vapnik-Chervonenkis dimension , 1989, Discret. Appl. Math..

[19]  David Haussler,et al.  Learnability and the Vapnik-Chervonenkis dimension , 1989, JACM.

[20]  Manfred K. Warmuth,et al.  Learning integer lattices , 1990, COLT '90.

[21]  Yoav Freund,et al.  Boosting a weak learning algorithm by majority , 1995, COLT '90.

[22]  David Haussler,et al.  Efficient Learning Algorithms. , 1990 .

[23]  Gerhard J. Woeginger,et al.  Some new bounds for Epsilon-nets , 1990, SCG '90.

[24]  N. Littlestone Mistake bounds and logarithmic linear-threshold learning algorithms , 1990 .

[25]  Kenneth L. Clarkson,et al.  RANDOMIZED GEOMETRIC ALGORITHMS , 1992 .

[26]  John Shawe-Taylor,et al.  Bounding Sample Size with the Vapnik-Chervonenkis Dimension , 1993, Discrete Applied Mathematics.

[27]  David Haussler,et al.  How to use expert advice , 1993, STOC.

[28]  Manfred K. Warmuth,et al.  On Weak Learning , 1995, J. Comput. Syst. Sci..

[29]  Manfred K. Warmuth,et al.  Relating Data Compression and Learnability , 2003 .

[30]  Sally A. Goldman,et al.  The Power of Self-Directed Learning , 1994, Machine Learning.

[31]  R. Schapire The Strength of Weak Learnability , 1990, Machine Learning.

[32]  V. Vapnik Estimation of Dependences Based on Empirical Data , 2006 .