Sample compression, learnability, and the Vapnik-Chervonenkis dimension

AbstractWithin the framework of pac-learning, we explore the learnability of concepts from samples using the paradigm of sample compression schemes. A sample compression scheme of sizek for a concept class $$C \subseteq 2^X $$ consists of a compression function and a reconstruction function. The compression function receives a finite sample set consistent with some concept inC and chooses a subset ofk examples as the compression set. The reconstruction function forms a hypothesis onX from a compression set ofk examples. For any sample set of a concept inC the compression set produced by the compression function must lead to a hypothesis consistent with the whole original sample set when it is fed to the reconstruction function. We demonstrate that the existence of a sample compression scheme of fixed-size for a classC is sufficient to ensure that the classC is pac-learnable.Previous work has shown that a class is pac-learnable if and only if the Vapnik-Chervonenkis (VC) dimension of the class is finite. In the second half of this paper we explore the relationship between sample compression schemes and the VC dimension. We definemaximum andmaximal classes of VC dimensiond. For every maximum class of VC dimensiond, there is a sample compression scheme of sized, and for sufficiently-large maximum classes there is no sample compression scheme of size less thand. We discuss briefly classes of VC dimensiond that are maximal but not maximum. It is an open question whether every class of VC dimensiond has a sample compression scheme of size O(d).

[1]  Norbert Sauer,et al.  On the Density of Families of Sets , 1972, J. Comb. Theory, Ser. A.

[2]  Tom M. Mitchell,et al.  Version Spaces: A Candidate Elimination Approach to Rule Learning , 1977, IJCAI.

[3]  Temple F. Smith Occam's razor , 1980, Nature.

[4]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, STOC '84.

[5]  J. Rissanen Stochastic Complexity and Modeling , 1986 .

[6]  David Haussler,et al.  Epsilon-nets and simplex range queries , 1986, SCG '86.

[7]  Leslie G. Valiant,et al.  Computational limitations on learning from examples , 1988, JACM.

[8]  Leslie G. Valiant,et al.  A general lower bound on the number of examples needed for learning , 1988, COLT '88.

[9]  David Haussler,et al.  Predicting {0,1}-functions on randomly drawn points , 1988, COLT '88.

[10]  Dana Angluin,et al.  Queries and concept learning , 1988, Machine Learning.

[11]  Manfred K. Warmuth,et al.  Learning Nested Differences of Intersection-Closed Concept Classes , 1989, COLT '89.

[12]  Sally Floyd,et al.  Space-bounded learning and the Vapnik-Chervonenkis dimension , 1989, COLT '89.

[13]  Ronald L. Rivest,et al.  Inferring Decision Trees Using the Minimum Description Length Principle , 1989, Inf. Comput..

[14]  Anselm Blumer,et al.  Learning faster than promised by the Vapnik-Chervonenkis dimension , 1989, Discret. Appl. Math..

[15]  David Haussler,et al.  Learnability and the Vapnik-Chervonenkis dimension , 1989, JACM.

[16]  Manfred K. Warmuth,et al.  Learning integer lattices , 1990, COLT '90.

[17]  Yoav Freund,et al.  Boosting a weak learning algorithm by majority , 1995, COLT '90.

[18]  Gerhard J. Woeginger,et al.  Some new bounds for Epsilon-nets , 1990, SCG '90.

[19]  N. Littlestone Mistake bounds and logarithmic linear-threshold learning algorithms , 1990 .

[20]  Kenneth L. Clarkson,et al.  RANDOMIZED GEOMETRIC ALGORITHMS , 1992 .

[21]  John Shawe-Taylor,et al.  Bounding Sample Size with the Vapnik-Chervonenkis Dimension , 1993, Discrete Applied Mathematics.

[22]  David Haussler,et al.  How to use expert advice , 1993, STOC.

[23]  Manfred K. Warmuth,et al.  On Weak Learning , 1995, J. Comput. Syst. Sci..

[24]  Manfred K. Warmuth,et al.  Relating Data Compression and Learnability , 2003 .

[25]  Sally A. Goldman,et al.  The Power of Self-Directed Learning , 1994, Machine Learning.

[26]  R. Schapire The Strength of Weak Learnability , 1990, Machine Learning.

[27]  V. Vapnik Estimation of Dependences Based on Empirical Data , 2006 .