ON THE OPTIMALITY OF SAMPLE-BASED ESTIMATES OF THE EXPECTATION OF THE EMPIRICAL MINIMIZER ∗, ∗∗

We study sample-based estimates of the expectation of the function produced by the empirical minimization algorithm. We investigate the extent to which one can estimate the rate of convergence of the empirical minimizer in a data dependent manner. We establish three main results. First, we provide an algorithm that upper bounds the expectation of the empirical minimizer in a completely data-dependent manner. This bound is based on a structural result due to Bartlett and Mendelson, which relates expectations to sample averages. Second, we show that these structural upper bounds can be loose, compared to previous bounds. In particular, we demonstrate a class for which the expectation of the empirical minimizer decreases as O(1/n) for sample size n, although the upper bound based on structural properties is Ω(1). Third, we show that this looseness of the bound is inevitable: we present an example that shows that a sharp bound cannot be universally recovered from empirical data.

[1]  P. Massart,et al.  Risk bounds for statistical learning , 2007, math/0702683.

[2]  M. Rudelson,et al.  Combinatorics of random processes and sections of convex bodies , 2004, math/0404192.

[3]  P. Massart Some applications of concentration inequalities to statistics , 2000 .

[4]  Shahar Mendelson,et al.  A Few Notes on Statistical Learning Theory , 2002, Machine Learning Summer School.

[5]  Gilles Blanchard,et al.  On the Rate of Convergence of Regularized Boosting Classifiers , 2003, J. Mach. Learn. Res..

[6]  O. Bousquet Concentration Inequalities and Empirical Processes Theory Applied to the Analysis of Learning Algorithms , 2002 .

[7]  S. Geer A New Approach to Least-Squares Estimation, with Applications , 1986 .

[8]  M. Ledoux The concentration of measure phenomenon , 2001 .

[9]  M. Talagrand New concentration inequalities in product spaces , 1996 .

[10]  S. Geer,et al.  Adaptivity of Support Vector Machines with ` 1 Penalty , 2004 .

[11]  V. Koltchinskii Local Rademacher complexities and oracle inequalities in risk minimization , 2006, 0708.0083.

[12]  G. Lugosi,et al.  On the Bayes-risk consistency of regularized boosting methods , 2003 .

[13]  Jon A. Wellner,et al.  Weak Convergence and Empirical Processes: With Applications to Statistics , 1996 .

[14]  Thierry Klein Une inégalité de concentration à gauche pour les processus empiriques , 2002 .

[15]  E. Rio,et al.  Inégalités de concentration pour les processus empiriques de classes de parties , 2001 .

[16]  S. R. Jammalamadaka,et al.  Empirical Processes in M-Estimation , 2001 .

[17]  P. Bartlett,et al.  Local Rademacher complexities , 2005, math/0508275.

[18]  Shahar Mendelson,et al.  Improving the sample complexity using global data , 2002, IEEE Trans. Inf. Theory.

[19]  Philip D. Plowright,et al.  Convexity , 2019, Optimization for Chemical and Biochemical Engineering.

[20]  P. MassartLedoux Concentration Inequalities Using the Entropy Method , 2002 .

[21]  A. W. van der Vaart,et al.  Uniform Central Limit Theorems , 2001 .

[22]  Vladimir Vapnik,et al.  Chervonenkis: On the uniform convergence of relative frequencies of events to their probabilities , 1971 .

[23]  P. Bartlett,et al.  Empirical minimization , 2006 .

[24]  David Haussler,et al.  Sphere Packing Numbers for Subsets of the Boolean n-Cube with Bounded Vapnik-Chervonenkis Dimension , 1995, J. Comb. Theory, Ser. A.

[25]  G. Lugosi,et al.  Complexity regularization via localized random penalties , 2004, math/0410091.

[26]  Michael I. Jordan,et al.  Convexity, Classification, and Risk Bounds , 2006 .

[27]  A. Tsybakov,et al.  Optimal aggregation of classifiers in statistical learning , 2003 .

[28]  M. Talagrand Sharper Bounds for Gaussian and Empirical Processes , 1994 .

[29]  Peter L. Bartlett,et al.  The Importance of Convexity in Learning with Squared Loss , 1998, IEEE Trans. Inf. Theory.

[30]  V. Koltchinskii Rejoinder: Local Rademacher complexities and oracle inequalities in risk minimization , 2006, 0708.0135.

[31]  V. Koltchinskii,et al.  Rademacher Processes and Bounding the Risk of Function Learning , 2004, math/0405338.

[32]  Peter L. Bartlett,et al.  Classification with a Reject Option using a Hinge Loss , 2008, J. Mach. Learn. Res..

[33]  P. Massart,et al.  About the constants in Talagrand's concentration inequalities for empirical processes , 2000 .