Shifting, One-Inclusion Mistake Bounds and Tight Multiclass Expected Risk Bounds

Under the prediction model of learning, a prediction strategy is presented with an i.i.d. sample of n – 1 points in Χ and corresponding labels from a concept ƒ ∈ Ƒ, and aims to minimize the worst-case probability of erring on an nth point. By exploiting the structure of Ƒ, Haussler et al. achieved a VC(Ƒ)/n bound for the natural one-inclusion prediction strategy, improving on bounds implied by PAC-type results by a O(log n) factor. The key data structure in their result is the natural subgraph of the hypercube—the one-inclusion graph; the key step is a d = VC(Ƒ) bound on one-inclusion graph density. The first main result of this paper is a density bound of n (≤n-1d-1) / (≤nd) < d, which positively resolves a conjecture of Kuzmin & Warmuth relating to their unlabeled Peeling compression scheme and also leads to an improved mistake bound for the randomized (deterministic) one-inclusion strategy for all d (for d ≈ Θ(n)). The proof uses a new form of VC-invariant shifting and a group-theoretic symmetrization. Our second main result is a k-class analogue of the d/n mistake bound, replacing the VC-dimension by the Pollard pseudo-dimension and the one-inclusion strategy by its natural hypergraph generalization. This bound on expected risk improves on known PAC-based results by a factor of O (log n) and is shown to be optimal up to a O(log k) factor. The combinatorial technique of shifting takes a central role in understanding the one-inclusion (hyper)graph and is a running theme throughout.