论文信息 - Shifting, One-Inclusion Mistake Bounds and Tight Multiclass Expected Risk Bounds

Shifting, One-Inclusion Mistake Bounds and Tight Multiclass Expected Risk Bounds

Under the prediction model of learning, a prediction strategy is presented with an i.i.d. sample of n – 1 points in Χ and corresponding labels from a concept ƒ ∈ Ƒ, and aims to minimize the worst-case probability of erring on an nth point. By exploiting the structure of Ƒ, Haussler et al. achieved a VC(Ƒ)/n bound for the natural one-inclusion prediction strategy, improving on bounds implied by PAC-type results by a O(log n) factor. The key data structure in their result is the natural subgraph of the hypercube—the one-inclusion graph; the key step is a d = VC(Ƒ) bound on one-inclusion graph density. The first main result of this paper is a density bound of n (≤n-1d-1) / (≤nd) < d, which positively resolves a conjecture of Kuzmin & Warmuth relating to their unlabeled Peeling compression scheme and also leads to an improved mistake bound for the randomized (deterministic) one-inclusion strategy for all d (for d ≈ Θ(n)). The proof uses a new form of VC-invariant shifting and a group-theoretic symmetrization. Our second main result is a k-class analogue of the d/n mistake bound, replacing the VC-dimension by the Pollard pseudo-dimension and the one-inclusion strategy by its natural hypergraph generalization. This bound on expected risk improves on known PAC-based results by a factor of O (log n) and is shown to be optimal up to a O(log k) factor. The combinatorial technique of shifting takes a central role in understanding the one-inclusion (hyper)graph and is a running theme throughout.

Peter L. Bartlett | Benjamin I. P. Rubinstein | J. Hyam Rubinstein | P. Bartlett | J. Rubinstein

[1] Leslie G. Valiant,et al. A general lower bound on the number of examples needed for learning , 1988, COLT '88.

[2] David Haussler,et al. Sphere Packing Numbers for Subsets of the Boolean n-Cube with Bounded Vapnik-Chervonenkis Dimension , 1995, J. Comb. Theory, Ser. A.

[3] Yi Li,et al. The one-inclusion graph algorithm is near-optimal for the prediction model of learning , 2001, IEEE Trans. Inf. Theory.

[4] Philip M. Long,et al. Characterizations of Learnability for Classes of {0, ..., n}-Valued Functions , 1995, J. Comput. Syst. Sci..

[5] Manfred K. Warmuth,et al. Unlabeled Compression Schemes for Maximum Classes, , 2007, COLT.

[6] Norbert Sauer,et al. On the Density of Families of Sets , 1972, J. Comb. Theory, Ser. A.

[7] David Haussler,et al. Predicting {0,1}-functions on randomly drawn points , 1988, COLT '88.

[8] Manfred K. Warmuth,et al. Relating Data Compression and Learnability , 2003 .

[9] Shai Ben-David,et al. Characterizations of learnability for classes of {O, …, n}-valued functions , 1992, COLT '92.