Online Convex Programming and Generalized Infinitesimal Gradient Ascent

Convex programming involves a convex set F ⊆ Rn and a convex cost function c : F → R. The goal of convex programming is to find a point in F which minimizes c. In online convex programming, the convex set is known in advance, but in each step of some repeated optimization problem, one must select a point in F before seeing the cost function for that step. This can be used to model factory production, farm production, and many other industrial optimization problems where one is unaware of the value of the items produced until they have already been constructed. We introduce an algorithm for this domain. We also apply this algorithm to repeated games, and show that it is really a generalization of infinitesimal gradient ascent, and the results here imply that generalized infinitesimal gradient ascent (GIGA) is universally consistent.

[1]  D. Blackwell An analog of the minimax theorem for vector payoffs. , 1956 .

[2]  James Hannan,et al.  4. APPROXIMATION TO RAYES RISK IN REPEATED PLAY , 1958 .

[3]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .

[4]  John C. G. Boot,et al.  Quadratic programming: algorithms, anomalies, applications. , 1964 .

[5]  N. Cameron Introduction to linear and convex programming , 1985 .

[6]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[7]  Philip M. Long,et al.  WORST-CASE QUADRATIC LOSS BOUNDS FOR ON-LINE PREDICTION OF LINEAR FUNCTIONS BY GRADIENT DESCENT , 1993 .

[8]  Manfred K. Warmuth,et al.  The Weighted Majority Algorithm , 1994, Inf. Comput..

[9]  D. Fudenberg,et al.  Consistency and Cautious Fictitious Play , 1995 .

[10]  Philip M. Long,et al.  Worst-case quadratic loss bounds for prediction using linear functions and gradient descent , 1996, IEEE Trans. Neural Networks.

[11]  S. Hart,et al.  A simple adaptive procedure leading to correlated equilibrium , 2000 .

[12]  Manfred K. Warmuth,et al.  Exponentiated Gradient Versus Gradient Descent for Linear Predictors , 1997, Inf. Comput..

[13]  Shun-ichi Amari,et al.  Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.

[14]  D. Fudenberg,et al.  The Theory of Learning in Games , 1998 .

[15]  Y. Freund,et al.  Adaptive game playing using multiplicative weights , 1999 .

[16]  Geoffrey J. Gordon,et al.  Approximate solutions to markov decision processes , 1999 .

[17]  D. Fudenberg,et al.  Conditional Universal Consistency , 1999 .

[18]  Dean P. Foster,et al.  A Proof of Calibration Via Blackwell's Approachability Theorem , 1999 .

[19]  Dean P. Foster,et al.  Regret in the On-Line Decision Problem , 1999 .

[20]  Yishay Mansour,et al.  Nash Convergence of Gradient Dynamics in General-Sum Games , 2000, UAI.

[21]  S. D. Pietra,et al.  Duality and Auxiliary Functions for Bregman Distances , 2001 .

[22]  Andreu Mas-Colell,et al.  A General Class of Adaptive Strategies , 1999, J. Econ. Theory.

[23]  Manuela M. Veloso,et al.  Convergence of Gradient Dynamics with a Variable Learning Rate , 2001, ICML.

[24]  Mark Herbster,et al.  Tracking the Best Linear Predictor , 2001, J. Mach. Learn. Res..

[25]  Robert E. Mahony,et al.  Prior Knowledge and Preferential Structures in Gradient Descent Learning Algorithms , 2001, J. Mach. Learn. Res..

[26]  Adam Tauman Kalai,et al.  Geometric algorithms for online optimization , 2002 .

[27]  Adam Meyerson,et al.  Online oblivious routing , 2003, SPAA '03.

[28]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[29]  Manfred K. Warmuth,et al.  Relative Loss Bounds for Multidimensional Regression Problems , 1997, Machine Learning.