Online Learning for Adversaries with Memory: Price of Past Mistakes

The framework of online learning with memory naturally captures learning problems with temporal effects, and was previously studied for the experts setting. In this work we extend the notion of learning with memory to the general Online Convex Optimization (OCO) framework, and present two algorithms that attain low regret. The first algorithm applies to Lipschitz continuous loss functions, obtaining optimal regret bounds for both convex and strongly convex losses. The second algorithm attains the optimal regret bounds and applies more broadly to convex losses without requiring Lipschitz continuity, yet is more complicated to implement. We complement the theoretical results with two applications: statistical arbitrage in finance, and multi-step ahead prediction in statistics.

[1]  Manfred K. Warmuth,et al.  The weighted majority algorithm , 1989, 30th Annual Symposium on Foundations of Computer Science.

[2]  S. Johansen Estimation and Hypothesis Testing of Cointegration Vectors in Gaussian Vector Autoregressive Models , 1991 .

[3]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[4]  Michael P. Clements,et al.  Multi-Step Estimation For Forecasting , 2009 .

[5]  G. Maddala,et al.  Unit roots, cointegration, and structural change , 1998 .

[6]  T. Vogelsang Unit Roots, Cointegration, and Structural Change , 2001 .

[7]  Neri Merhav,et al.  On sequential strategies for loss functions with memory , 2002, IEEE Trans. Inf. Theory.

[8]  Santosh S. Vempala,et al.  Logconcave functions: geometry and efficient sampling algorithms , 2003, 44th Annual IEEE Symposium on Foundations of Computer Science, 2003. Proceedings..

[9]  J. Stock,et al.  A Comparison of Direct and Iterated Multistep Ar Methods for Forecasting Macroeconomic Time Series , 2005 .

[10]  Gábor Lugosi,et al.  Prediction, learning, and games , 2006 .

[11]  Alexandre d'Aspremont,et al.  Identifying small mean-reverting portfolios , 2007, ArXiv.

[12]  Elad Hazan,et al.  Logarithmic regret algorithms for online convex optimization , 2006, Machine Learning.

[13]  Jakub W. Jurek,et al.  Dynamic Portfolio Selection in Arbitrage , 2007 .

[14]  Berthold Vöcking,et al.  Regret Minimization for Online Buffering Problems Using the Weighted Majority Algorithm , 2010, Electron. Colloquium Comput. Complex..

[15]  Hariharan Narayanan,et al.  Random Walk Approach to Regret Minimization , 2010, NIPS.

[16]  A. Schmidt Financial Markets and Trading: An Introduction to Market Microstructure and Trading Strategies , 2011 .

[17]  Elad Hazan The convex optimization approach to regret minimization , 2011 .

[18]  Ambuj Tewari,et al.  Online Bandit Learning against an Adaptive Adversary: from Regret to Policy Regret , 2012, ICML.

[19]  Shai Shalev-Shwartz,et al.  Online Learning and Online Convex Optimization , 2012, Found. Trends Mach. Learn..

[20]  Shie Mannor,et al.  Online Learning for Time Series Prediction , 2013, COLT.

[21]  Nicolò Cesa-Bianchi,et al.  Online Learning with Switching Costs and Other Adaptive Adversaries , 2013, NIPS.

[22]  Alexandre d'Aspremont,et al.  Mean Reversion with a Variance Threshold , 2013, ICML.

[23]  Yuval Peres,et al.  Bandits with switching costs: T2/3 regret , 2013, STOC.

[24]  Eyal Gofer Higher-Order Regret Bounds with Switching Costs , 2014, COLT.

[25]  Gergely Neu,et al.  Near-Optimal Rates for Limited-Delay Universal Lossy Source Coding , 2014, IEEE Trans. Inf. Theory.

[26]  Elad Hazan,et al.  Online Time Series Prediction with Missing Data , 2015, ICML.