Model selection in markovian processes

When analyzing data that originated from a dynamical system, a common practice is to encompass the problem in the well known frameworks of Markov Decision Processes (MDPs) and Reinforcement Learning (RL). The state space in these solutions is usually chosen in some heuristic fashion and the formed MDP can then be used to simulate and predict data, as well as indicate the best possible action in each state. The model chosen to characterize the data affects the complexity and accuracy of any further action we may wish to apply, yet few methods that rely on the dynamic structure to select such a model were suggested. In this work we address the problem of how to use time series data to choose from a finite set of candidate discrete state spaces, where these spaces are constructed by a domain expert. We formalize the notion of model selection consistency in the proposed setup. We then discuss the difference between our proposed framework and the classical Maximum Likelihood (ML) framework, and give an example where ML fails. Afterwards, we suggest alternative selection criteria and show them to be weakly consistent. We then define weak consistency for a model construction algorithm and show a simple algorithm that is weakly consistent. Finally, we test the performance of the suggested criteria and algorithm on both simulated and real world data.

[1]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[2]  Thomas J. Walsh,et al.  Towards a Unified Theory of State Abstraction for MDPs , 2006, AI&M.

[3]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[4]  John B. Moore,et al.  Hidden Markov Models: Estimation and Control , 1994 .

[5]  Peter Stone,et al.  State Abstraction Discovery from Irrelevant State Variables , 2005, IJCAI.

[6]  Thomas M. Cover,et al.  Elements of information theory (2. ed.) , 2006 .

[7]  Rémi Munos,et al.  Selecting the State-Representation in Reinforcement Learning , 2011, NIPS.

[8]  John N. Tsitsiklis,et al.  Dynamic Catalog Mailing Policies , 2006, Manag. Sci..

[9]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[10]  Joelle Pineau,et al.  PAC-Bayesian Model Selection for Reinforcement Learning , 2010, NIPS.

[11]  Andrew McCallum,et al.  Reinforcement learning with selective perception and hidden state , 1996 .

[12]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[13]  Balaraman Ravindran,et al.  SMDP Homomorphisms: An Algebraic Approach to Abstraction in Semi-Markov Decision Processes , 2003, IJCAI.

[14]  J. Rissanen,et al.  Modeling By Shortest Data Description* , 1978, Autom..

[15]  Robert Givan,et al.  Equivalence notions and model minimization in Markov decision processes , 2003, Artif. Intell..

[16]  H. Akaike A new look at the statistical model identification , 1974 .

[17]  Richard L. Lewis,et al.  Where Do Rewards Come From , 2009 .

[18]  Keiji Kanazawa,et al.  A model for reasoning about persistence and causation , 1989 .

[19]  Csaba Szepesvári,et al.  Model Selection in Reinforcement Learning , 2011, Machine Learning.

[20]  Sridhar Mahadevan,et al.  Learning Representation and Control in Markov Decision Processes: New Frontiers , 2009, Found. Trends Mach. Learn..

[21]  Naoki Abe,et al.  Sequential cost-sensitive decision making with reinforcement learning , 2002, KDD.