论文信息 - Learning Near-optimal Convex Combinations of Basis Models with Generalization Guarantees

Learning Near-optimal Convex Combinations of Basis Models with Generalization Guarantees

The problem of learning an optimal convex combination of basis models has been studied in a number of works, with a focus on the theoretical analysis, but little investigation on the empirical performance of the approach. In this paper, we present some new theoretical insights, and empirical results that demonstrate the effectiveness of the approach. Theoretically, we first consider whether we can replace convex combinations by linear combinations, and obtain convergence results similar to existing results for learning from a convex hull. We present a negative result showing that the linear hull of very simple basis functions can have unbounded capacity, and is thus prone to overfitting. On the other hand, convex hulls are still rich but have bounded capacities. In addition, we obtain a generalization bound for a general class of Lipschitz loss functions. Empirically, we first discuss how a convex combination can be greedily learned with early stopping, and how a convex combination can be non-greedily learned when the number of basis models is known a priori. Our experiments suggest that the greedy scheme is competitive with or better than several baselines, including boosting and random forests. The greedy algorithm requires little effort in hyper-parameter tuning, and also seems to adapt to the underlying complexity of the problem.

Peter L. Bartlett | Nan Ye | Tan Nguyen

[1] P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[2] Tsuyoshi Murata,et al. {m , 1934, ACML.

[3] Patrice Marcotte,et al. Some comments on Wolfe's ‘away step’ , 1986, Math. Program..

[4] D. Pollard. Convergence of stochastic processes , 1984 .

[5] Peter L. Bartlett,et al. Neural Network Learning - Theoretical Foundations , 1999 .

[6] Peter L. Bartlett,et al. Functional Gradient Techniques for Combining Hypotheses , 2000 .

[7] Robert E. Schapire,et al. Explaining AdaBoost , 2013, Empirical Inference.

[8] Tong Zhang,et al. Sequential greedy approximation for certain convex optimization problems , 2003, IEEE Trans. Inf. Theory.

[9] Andrew R. Barron,et al. Universal approximation bounds for superpositions of a sigmoidal function , 1993, IEEE Trans. Inf. Theory.

[10] C. Darken,et al. Constructive Approximation Rates of Convex Approximation in Non-hilbert Spaces , 2022 .

[11] David Mease,et al. Explaining the Success of AdaBoost and Random Forests as Interpolating Classifiers , 2015, J. Mach. Learn. Res..