论文信息 - An adaptive stochastic optimization algorithm for resource allocation

An adaptive stochastic optimization algorithm for resource allocation

We consider the classical problem of sequential resource allocation where a decision maker must repeatedly divide a budget between several resources, each with diminishing returns. This can be recast as a specific stochastic optimization problem where the objective is to maximize the cumulative reward, or equivalently to minimize the regret. We construct an algorithm that is {\em adaptive} to the complexity of the problem, expressed in term of the regularity of the returns of the resources, measured by the exponent in the {\L}ojasiewicz inequality (or by their universal concavity parameter). Our parameter-independent algorithm recovers the optimal rates for strongly-concave functions and the classical fast rates of multi-armed bandit (for linear reward functions). Moreover, the algorithm improves existing results on stochastic optimization in this regret minimization setting for intermediate cases.

Shie Mannor | Vianney Perchet | Xavier Fontaine

[1] Thomas P. Hayes,et al. Stochastic Linear Optimization under Bandit Feedback , 2008, COLT.

[2] Bernard O. Koopman,et al. The Optimum Distribution of Effort , 1953, Oper. Res..

[3] J. Bolte,et al. Characterizations of Lojasiewicz inequalities: Subgradient flows, talweg, convexity , 2009 .

[4] José Manuel Colom,et al. The Resource Allocation Problem in Flexible Manufacturing Systems , 2003, ICATPN.

[5] Mark W. Schmidt,et al. Linear Convergence of Gradient and Proximal-Gradient Methods Under the Polyak-Łojasiewicz Condition , 2016, ECML/PKDD.

[6] Nikhil R. Devanur,et al. Near optimal online algorithms and fast approximation algorithms for resource allocation problems , 2011, EC '11.

[7] Vianney Perchet,et al. Anytime optimal algorithms in stochastic multi-armed bandits , 2016, ICML.

[8] S. Afriat. Theory of Maxima and the Method of Lagrange , 1971 .

[9] Vianney Perchet,et al. The multi-armed bandit problem with covariates , 2011, ArXiv.

[10] Koby Crammer,et al. A Better Resource Allocation Algorithm with Semi-Bandit Feedback , 2018, ALT.

[11] Nikhil R. Devanur,et al. Fast Algorithms for Online Stochastic Convex Programming , 2014, SODA.