Machine Learning Algorithms with Applications in Finance

Online decision making and learning occur in a great variety of scenarios. The decisions involved may consist of stock trading, ad placement, route planning, picking a heuristic, or making a move in a game. Such scenarios vary also in the complexity of the environment or the opponent, the available feedback, and the nature of possible decisions. Remarkably, in the last few decades, the theory of online learning has produced algorithms that can cope with this rich set of problems. These algorithms have two very desirable properties. First, they make minimal and often worst-case assumptions on the nature of the learning scenario, making them robust. Second, their success is guaranteed to converge to that of the best strategy in a benchmark set, a property referred to as regret minimization. This work deals both with the general theory of regret minimization, and with its implications for pricing financial derivatives. One contribution to the theory of regret minimization is a trade-off result, which shows that some of the most important regret minimization algorithms are also guaranteed to have non-negative and even positive levels of regret for any sequence of plays by the environment. Another contribution provides improved regret minimization algorithms for scenarios in which the benchmark set of strategies has a high level of redundancy; these scenarios are captured in a model of dynamically branching strategies. The contributions to derivative pricing build on a reduction from the problem of pricing derivatives to the problem of bounding the regret of trading algorithms. They comprise regret minimization-based price bounds for a variety of financial derivatives, obtained both by means of existing algorithms and specially designed ones. Moreover, a direct method for converting the performance guarantees of general-purpose regret minimization algorithms into performance guarantees in a trading scenario is developed and used to derive novel lower and upper bounds on derivative prices.

[1]  J. Schur Zwei Sätze über algebraische Gleichungen mit lauter reellen Wurzeln. , .

[2]  W. R. Thompson ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .

[3]  D. Blackwell An analog of the minimax theorem for vector payoffs. , 1956 .

[4]  James Hannan,et al.  4. APPROXIMATION TO RAYES RISK IN REPEATED PLAY , 1958 .

[5]  B. Mandelbrot The Variation of Certain Speculative Prices , 1963 .

[6]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .

[7]  R. Tyrrell Rockafellar,et al.  Convex Analysis , 1970, Princeton Landmarks in Mathematics and Physics.

[8]  F. Black,et al.  The Pricing of Options and Corporate Liabilities , 1973, Journal of Political Economy.

[9]  R. C. Merton,et al.  Option pricing when underlying stock returns are discontinuous , 1976 .

[10]  Douglas T. Breeden,et al.  Prices of State-Contingent Claims Implicit in Option Prices , 1978 .

[11]  John Darzentas,et al.  Problem Complexity and Method Efficiency in Optimization , 1983 .

[12]  Robert E. Tarjan,et al.  Amortized efficiency of list update and paging rules , 1985, CACM.

[13]  J. Hull Options, Futures, and Other Derivatives , 1989 .

[14]  Manfred K. Warmuth,et al.  The weighted majority algorithm , 1989, 30th Annual Symposium on Foundations of Computer Science.

[15]  Vladimir Vovk,et al.  Aggregating strategies , 1990, COLT '90.

[16]  A. Conze,et al.  Path Dependent Options: The Case of Lookback Options , 1991 .

[17]  David Haussler,et al.  How to use expert advice , 1993, STOC.

[18]  Mark Herbster,et al.  Tracking the Best Expert , 1995, Machine-mediated learning.

[19]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[20]  Vladimir Vovk,et al.  A game of prediction with expert advice , 1995, COLT '95.

[21]  Erik Ordentlich,et al.  Universal portfolios with side information , 1996, IEEE Trans. Inf. Theory.

[22]  T. Cover Universal Portfolios , 1996 .

[23]  Yoram Singer,et al.  On‐Line Portfolio Selection Using Multiplicative Updates , 1998, ICML.

[24]  A. Blum,et al.  Universal portfolios with and without transaction costs , 1997, COLT '97.

[25]  Yoram Singer,et al.  Using and combining predictors that specialize , 1997, STOC '97.

[26]  Yoram Singer,et al.  Switching Portfolios , 1998, Int. J. Neural Syst..

[27]  Avrim Blum,et al.  On-line Learning and the Metrical Task System Problem , 1997, COLT '97.

[28]  Allan Borodin,et al.  Online computation and competitive analysis , 1998 .

[29]  Erik Ordentlich,et al.  The Cost of Achieving the Best Portfolio in Hindsight , 1998, Math. Oper. Res..

[30]  David Hobson,et al.  Robust hedging of the lookback option , 1998, Finance Stochastics.

[31]  Vladimir Vovk,et al.  Universal portfolio selection , 1998, COLT' 98.

[32]  Rajendra Bhatia,et al.  A Better Bound on the Variance , 2000, Am. Math. Mon..

[33]  Santosh S. Vempala,et al.  Efficient algorithms for universal portfolios , 2000, Proceedings 41st Annual Symposium on Foundations of Computer Science.

[34]  Manfred K. Warmuth,et al.  Tracking a Small Set of Experts by Mixing Past Posteriors , 2003, J. Mach. Learn. Res..

[35]  Ran El-Yaniv,et al.  Optimal Search and One-Way Trading Online Algorithms , 2001, Algorithmica.

[36]  G. Shafer,et al.  Probability and Finance: It's Only a Game! , 2001 .

[37]  Claudio Gentile,et al.  Adaptive and Self-Confident On-Line Learning Algorithms , 2000, J. Comput. Syst. Sci..

[38]  Peter Auer,et al.  The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..

[39]  W. Schoutens Lévy Processes in Finance: Pricing Financial Derivatives , 2003 .

[40]  Allan Borodin,et al.  Can We Learn to Beat the Best Stock , 2003, NIPS.

[41]  Martin Zinkevich,et al.  Online Convex Programming and Generalized Infinitesimal Gradient Ascent , 2003, ICML.

[42]  R. Cont,et al.  Financial Modelling with Jump Processes , 2003 .

[43]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[44]  B. Russell,et al.  Problems Of Philosophy , 2004, Synthese.

[45]  Yurii Nesterov,et al.  Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[46]  Y. Mansour,et al.  Improved Second-Order Bounds for Prediction with Expert Advice , 2005, COLT.

[47]  R. C. Merton,et al.  Theory of Rational Option Pricing , 2015, World Scientific Reference on Contingent Claims Analysis in Corporate Finance.

[48]  Santosh S. Vempala,et al.  Efficient algorithms for online decision problems , 2005, Journal of computer and system sciences (Print).

[49]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[50]  G. Lugosi,et al.  NONPARAMETRIC KERNEL‐BASED SEQUENTIAL INVESTMENT STRATEGIES , 2006 .

[51]  Gábor Lugosi,et al.  Prediction, learning, and games , 2006 .

[52]  Yishay Mansour,et al.  Online trading algorithms and robust option pricing , 2006, STOC '06.

[53]  Sanjeev Arora,et al.  Efficient algorithms for online convex optimization and their applications , 2006 .

[54]  Robert E. Schapire,et al.  Algorithms for portfolio management based on the Newton method , 2006, ICML.

[55]  Yishay Mansour,et al.  From External to Internal Regret , 2005, J. Mach. Learn. Res..

[56]  Yoram Singer,et al.  A primal-dual perspective of online learning algorithms , 2007, Machine Learning.

[57]  H. Robbins Some aspects of the sequential design of experiments , 1952 .

[58]  Elad Hazan,et al.  Logarithmic regret algorithms for online convex optimization , 2006, Machine Learning.

[59]  Shai Shalev-Shwartz,et al.  Online learning: theory, algorithms and applications (למידה מקוונת.) , 2007 .

[60]  Konstantinos Panagiotou,et al.  Optimal Algorithms for k-Search with Application in Option Pricing , 2007, Algorithmica.

[61]  Yishay Mansour,et al.  Regret to the best vs. regret to the average , 2007, Machine Learning.

[62]  Yoav Freund,et al.  A Parameter-free Hedging Algorithm , 2009, NIPS.

[63]  Jean-Yves Audibert,et al.  Minimax Policies for Adversarial and Stochastic Bandits. , 2009, COLT 2009.

[64]  Elad Hazan,et al.  On Stochastic and Worst-case Models for Investing , 2009, NIPS.

[65]  Alexander M. Millkey The Black Swan: The Impact of the Highly Improbable , 2009 .

[66]  Tim Roughgarden,et al.  Algorithmic Game Theory , 2007 .

[67]  Jean-Yves Audibert,et al.  Regret Bounds and Minimax Policies under Partial Monitoring , 2010, J. Mach. Learn. Res..

[68]  Robert D. Kleinberg,et al.  Regret bounds for sleeping experts and bandits , 2010, Machine Learning.

[69]  Vladimir Vovk,et al.  Prediction with Advice of Unknown Number of Experts , 2010, UAI.

[70]  Elad Hazan,et al.  Extracting certainty from uncertainty: regret bounded by variation in costs , 2008, Machine Learning.

[71]  Yishay Mansour,et al.  Regret Minimization Algorithms for Pricing Lookback Options , 2011, ALT.

[72]  Changsong Deng,et al.  Statistics and Probability Letters , 2011 .

[73]  Gábor Lugosi,et al.  Minimax Policies for Combinatorial Prediction Games , 2011, COLT.

[74]  Elad Hazan,et al.  Better Algorithms for Benign Bandits , 2009, J. Mach. Learn. Res..

[75]  Cosma Rohilla Shalizi,et al.  Adapting to Non-stationarity with Growing Expert Ensembles , 2011, ArXiv.

[76]  Peter L. Bartlett,et al.  Blackwell Approachability and No-Regret Learning are Equivalent , 2010, COLT.

[77]  Wouter M. Koolen,et al.  Probability-free pricing of adjusted American lookbacks , 2011, 1108.4113.

[78]  Yishay Mansour,et al.  Pricing Exotic Derivatives Using Regret Minimization , 2011, SAGT.

[79]  Elad Hazan The convex optimization approach to regret minimization , 2011 .

[80]  A. Dawid,et al.  Insuring against loss of evidence in game-theoretic probability , 2010, 1005.1811.

[81]  D. Hobson The Skorokhod Embedding Problem and Model-Independent Bounds for Option Prices , 2011 .

[82]  Wouter M. Koolen,et al.  Buy Low, Sell High , 2012, ALT.

[83]  Wouter M. Koolen,et al.  Putting Bayes to sleep , 2012, NIPS.

[84]  Sébastien Bubeck,et al.  Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems , 2012, Found. Trends Mach. Learn..

[85]  Joseph Naor,et al.  Unified Algorithms for Online Learning and Competitive Analysis , 2012, COLT.

[86]  Rong Jin,et al.  25th Annual Conference on Learning Theory Online Optimization with Gradual Variations , 2022 .

[87]  Ohad Shamir,et al.  Relax and Localize: From Value to Algorithms , 2012, ArXiv.

[88]  Claudio Gentile,et al.  Regret Minimization for Branching Experts , 2022 .

[89]  Switching investments , 2013, Theor. Comput. Sci..

[90]  Jiapeng Zhang Minimax Option Pricing Meets Black-Scholes in the Limit , 2014 .

[91]  Yishay Mansour,et al.  Lower bounds on individual sequence regret , 2012, Machine Learning.

[92]  T. L. Lai Andherbertrobbins Asymptotically Efficient Adaptive Allocation Rules , 2022 .