Statistical Optimization in High Dimensions

We consider optimization problems whose parameters are known only approximately, based on noisy samples. Of particular interest is the high-dimensional regime, where the number of samples is roughly equal to the dimensionality of the problem, and the noise magnitude may greatly exceed the magnitude of the signal itself. This setup falls far outside the traditional scope of Robust and Stochastic optimization. We propose three algorithms to address this setting, combining ideas from statistics, machine learning, and robust optimization. In the important case where noise artificially increases the dimensionality of the parameters, we show that combining robust optimization and dimensionality reduction can result in high-quality solutions at greatly reduced computational cost.

[1]  Anand V. Bodapati Recommendation Systems with Purchase Data , 2008 .

[2]  Arkadi Nemirovski,et al.  Robust solutions of uncertain linear programs , 1999, Oper. Res. Lett..

[3]  Shie Mannor,et al.  Robust Regression and Lasso , 2008, IEEE Transactions on Information Theory.

[4]  Alexander Shapiro,et al.  On the Rate of Convergence of Optimal Solutions of Monte Carlo Approximations of Stochastic Programs , 2000, SIAM J. Optim..

[5]  R. Rockafellar,et al.  Optimization of conditional value-at risk , 2000 .

[6]  S. Szarek,et al.  Chapter 8 - Local Operator Theory, Random Matrices and Banach Spaces , 2001 .

[7]  Melvyn Sim,et al.  TRACTABLE ROBUST EXPECTED UTILITY AND RISK MODELS FOR PORTFOLIO OPTIMIZATION , 2009 .

[8]  Shie Mannor,et al.  Robustness and Regularization of Support Vector Machines , 2008, J. Mach. Learn. Res..

[9]  Terence Tao,et al.  The Dantzig selector: Statistical estimation when P is much larger than n , 2005, math/0506081.

[10]  Jean-Yves Audibert Optimization for Machine Learning , 1995 .

[11]  Yoram Singer,et al.  A primal-dual perspective of online learning algorithms , 2007, Machine Learning.

[12]  Yinyu Ye,et al.  Distributionally Robust Optimization Under Moment Uncertainty with Application to Data-Driven Problems , 2010, Oper. Res..

[13]  Nathan Srebro,et al.  SVM optimization: inverse dependence on training set size , 2008, ICML '08.

[14]  Alexander J. Smola,et al.  Second Order Cone Programming Approaches for Handling Missing and Uncertain Data , 2006, J. Mach. Learn. Res..

[15]  Antonio Alonso Ayuso,et al.  Introduction to Stochastic Programming , 2009 .

[16]  Ruiwei Jiang,et al.  Data-driven chance constrained stochastic program , 2016, Math. Program..

[17]  G. Calafiore,et al.  On Distributionally Robust Chance-Constrained Linear Programs , 2006 .

[18]  R. Kohli,et al.  Internet Recommendation Systems , 2000 .

[19]  R. Wets,et al.  Stochastic programming , 1989 .

[20]  Jean-Philippe Vial,et al.  Robust Optimization , 2021, ICORES.

[21]  Georgia Perakis,et al.  The Data-Driven Newsvendor Problem: New Bounds and Insights , 2015, Oper. Res..

[22]  Constantine Caramanis,et al.  Theory and Applications of Robust Optimization , 2010, SIAM Rev..

[23]  Dimitris Bertsimas,et al.  Constructing Uncertainty Sets for Robust Linear Optimization , 2009, Oper. Res..

[24]  Giuseppe Carlo Calafiore,et al.  Uncertain convex programs: randomized solutions and confidence levels , 2005, Math. Program..

[25]  John Langford,et al.  Sparse Online Learning via Truncated Gradient , 2008, NIPS.

[26]  Melvyn Sim,et al.  The Price of Robustness , 2004, Oper. Res..

[27]  Jon A. Wellner,et al.  Weak Convergence and Empirical Processes: With Applications to Statistics , 1996 .

[28]  Joel A. Tropp,et al.  Just relax: convex programming methods for identifying sparse signals in noise , 2006, IEEE Transactions on Information Theory.

[29]  R. Rockafellar,et al.  Conditional Value-at-Risk for General Loss Distributions , 2001 .

[30]  Emmanuel J. Candès,et al.  A Singular Value Thresholding Algorithm for Matrix Completion , 2008, SIAM J. Optim..

[31]  David B. Shmoys,et al.  Provably Near-Optimal Sampling-Based Policies for Stochastic Inventory Control Models , 2007, Math. Oper. Res..

[32]  Peter L. Bartlett,et al.  Neural Network Learning - Theoretical Foundations , 1999 .

[33]  Heng Tao Shen,et al.  Principal Component Analysis , 2009, Encyclopedia of Biometrics.

[34]  Alexander Shapiro,et al.  The Sample Average Approximation Method for Stochastic Discrete Optimization , 2002, SIAM J. Optim..

[35]  Jos F. Sturm,et al.  A Matlab toolbox for optimization over symmetric cones , 1999 .