An adaptive stochastic optimization algorithm for resource allocation

We consider the classical problem of sequential resource allocation where a decision maker must repeatedly divide a budget between several resources, each with diminishing returns. This can be recast as a specific stochastic optimization problem where the objective is to maximize the cumulative reward, or equivalently to minimize the regret. We construct an algorithm that is {\em adaptive} to the complexity of the problem, expressed in term of the regularity of the returns of the resources, measured by the exponent in the {\L}ojasiewicz inequality (or by their universal concavity parameter). Our parameter-independent algorithm recovers the optimal rates for strongly-concave functions and the classical fast rates of multi-armed bandit (for linear reward functions). Moreover, the algorithm improves existing results on stochastic optimization in this regret minimization setting for intermediate cases.

[1]  Thomas P. Hayes,et al.  Stochastic Linear Optimization under Bandit Feedback , 2008, COLT.

[2]  Bernard O. Koopman,et al.  The Optimum Distribution of Effort , 1953, Oper. Res..

[3]  J. Bolte,et al.  Characterizations of Lojasiewicz inequalities: Subgradient flows, talweg, convexity , 2009 .

[4]  José Manuel Colom,et al.  The Resource Allocation Problem in Flexible Manufacturing Systems , 2003, ICATPN.

[5]  Mark W. Schmidt,et al.  Linear Convergence of Gradient and Proximal-Gradient Methods Under the Polyak-Łojasiewicz Condition , 2016, ECML/PKDD.

[6]  Nikhil R. Devanur,et al.  Near optimal online algorithms and fast approximation algorithms for resource allocation problems , 2011, EC '11.

[7]  Vianney Perchet,et al.  Anytime optimal algorithms in stochastic multi-armed bandits , 2016, ICML.

[8]  S. Afriat Theory of Maxima and the Method of Lagrange , 1971 .

[9]  Vianney Perchet,et al.  The multi-armed bandit problem with covariates , 2011, ArXiv.

[10]  Koby Crammer,et al.  A Better Resource Allocation Algorithm with Semi-Bandit Feedback , 2018, ALT.

[11]  Nikhil R. Devanur,et al.  Fast Algorithms for Online Stochastic Convex Programming , 2014, SODA.

[12]  John N. Tsitsiklis,et al.  Linearly Parameterized Bandits , 2008, Math. Oper. Res..

[13]  Elad Hazan,et al.  Introduction to Online Convex Optimization , 2016, Found. Trends Optim..

[14]  M. Berger Nonlinearity and Functional Analysis: Lectures on Nonlinear Problems in Mathematical Analysis , 2011 .

[15]  Vianney Perchet,et al.  Fast Rates for Bandit Optimization with Upper-Confidence Frank-Wolfe , 2017, NIPS.

[16]  Victor C. M. Leung,et al.  Energy-Efficient Resource Allocation in NOMA Heterogeneous Networks , 2018, IEEE Wireless Communications.

[17]  Vianney Perchet,et al.  Highly-Smooth Zero-th Order Online Optimization , 2016, COLT.

[18]  Y. Nesterov,et al.  Primal-dual subgradient methods for minimizing uniformly convex functions , 2010, 1401.1792.

[19]  Aarti Singh,et al.  Algorithmic Connections between Active Learning and Stochastic Convex Optimization , 2013, ALT.

[20]  Nikhil R. Devanur,et al.  Bandits with concave rewards and convex knapsacks , 2014, EC.

[21]  Robert D. Nowak,et al.  Minimax Bounds for Active Learning , 2007, IEEE Transactions on Information Theory.

[22]  Sham M. Kakade,et al.  Stochastic Convex Optimization with Bandit Feedback , 2011, SIAM J. Optim..

[23]  Morteza Zadimoghaddam,et al.  Online Submodular Welfare Maximization: Greedy Beats 1/2 in Random Order , 2015, STOC.

[24]  Aarti Singh,et al.  Optimal rates for first-order stochastic convex optimization under Tsybakov noise condition , 2012, ICML 2013.

[25]  Anthony A. Maciejewski,et al.  Stochastic-based robust dynamic resource allocation for independent tasks in a heterogeneous computing system , 2016, J. Parallel Distributed Comput..

[26]  E. Bierstone,et al.  Semianalytic and subanalytic sets , 1988 .

[27]  Csaba Szepesvári,et al.  Improved Algorithms for Linear Stochastic Bandits , 2011, NIPS.

[28]  Ohad Shamir,et al.  On the Complexity of Bandit and Derivative-Free Stochastic Convex Optimization , 2012, COLT.

[29]  Koby Crammer,et al.  Linear Multi-Resource Allocation with Semi-Bandit Feedback , 2015, NIPS.