Fast-Tracking Stationary MOMDPs for Adaptive Management Problems

Adaptive management is applied in conservation and natural resource management, and consists of making sequential decisions when the transition matrix is uncertain. Informally described as ’learning by doing’, this approach aims to trade off between decisions that help achieve the objective and decisions that will yield a better knowledge of the true transition matrix. When the true transition matrix is assumed to be an element of a finite set of possible matrices, solving a mixed observability Markov decision process (MOMDP) leads to an optimal trade-off but is very computationally demanding. Under the assumption (common in adaptive management) that the true transition matrix is stationary, we propose a polynomial-time algorithm to find a lower bound of the value function. In the corners of the domain of the value function (belief space), this lower bound is provably equal to the optimal value function. We also show that under further assumptions, it is a linear approximation of the optimal value function in a neighborhood around the corners. We evaluate the benefits of our approach by using it to initialize the solvers MO-SARSOP and Perseus on a novel computational sustainability problem and a recent adaptive management data challenge. Our approach leads to an improved initial value function and translates into significant computational gains for both solvers.

[1]  Karl Johan Åström,et al.  Optimal control of Markov processes with incomplete state information , 1965 .

[2]  Edward J. Sondik,et al.  The Optimal Control of Partially Observable Markov Processes over a Finite Horizon , 1973, Oper. Res..

[3]  C. Walters,et al.  Adaptive Control of Fishing Systems , 1976 .

[4]  Carl J. Walters,et al.  ECOLOGICAL OPTIMIZATION AND ADAPTIVE MANAGEMENT , 1978 .

[5]  S. Frederick,et al.  Choosing fisheries harvest policies : when does uncertainty matter? , 1995 .

[6]  Leslie Pack Kaelbling,et al.  On the Complexity of Solving Markov Decision Problems , 1995, UAI.

[7]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[8]  Milos Hauskrecht,et al.  Planning and control in stochastic domains with imperfect information , 1997 .

[9]  Jesse Hoey,et al.  SPUDD: Stochastic Planning using Decision Diagrams , 1999, UAI.

[10]  F. Johnson,et al.  Conditions and Limitations on Learning in the Adaptive Management of Mallard Harvests , 2002 .

[11]  A. Cassandra A Survey of POMDP Applications , 2003 .

[12]  Michael O. Duff,et al.  Design for an Optimal Probe , 2003, ICML.

[13]  Nikos A. Vlassis,et al.  Perseus: Randomized Point-based Value Iteration for POMDPs , 2005, J. Artif. Intell. Res..

[14]  P. Poupart Exploiting structure to efficiently solve large scale partially observable Markov decision processes , 2005 .

[15]  S. Ritchie,et al.  Discovery of a Widespread Infestation of Aedes albopictus in the Torres Strait, Australia , 2006, Journal of the American Mosquito Control Association.

[16]  C. T. Moore,et al.  Optimal Regeneration Planning for Old-Growth Forest: Addressing Scientific Uncertainty in Endangered Species Recovery through Adaptive Management , 2006 .

[17]  David Hsu,et al.  SARSOP: Efficient Point-Based POMDP Planning by Approximating Optimally Reachable Belief Spaces , 2008, Robotics: Science and Systems.

[18]  Dan Zhang,et al.  Pricing substitutable flights in airline revenue management , 2009, Eur. J. Oper. Res..

[19]  D. Merl,et al.  A Statistical Framework for the Adaptive Management of Epidemiological Interventions , 2009, PloS one.

[20]  Olivier Buffet,et al.  A Closer Look at MOMDPs , 2010, 2010 22nd IEEE International Conference on Tools with Artificial Intelligence.

[21]  David Hsu,et al.  Planning under Uncertainty for Robotic Tasks with Mixed Observability , 2010, Int. J. Robotics Res..

[22]  Olivier Buffet,et al.  Markov decision processes in artificial intelligence : MDPs, beyond MDPs and applications , 2010 .

[23]  Olivier Buffet,et al.  Markov Decision Processes in Artificial Intelligence , 2010 .

[24]  Sarah J. Converse,et al.  Special Issue Article: Adaptive management for biodiversity conservation in an uncertain world Which uncertainty? Using expert elicitation and expected value of information to design an adaptive program , 2011 .

[25]  Olivier Buffet,et al.  MOMDPs: A Solution for Modelling Adaptive Management Problems , 2012, AAAI.

[26]  Shie Mannor,et al.  Bayesian Reinforcement Learning , 2012, Reinforcement Learning.

[27]  Igor Linkov,et al.  Enhanced Adaptive Management: Integrating Decision Analysis, Scenario Analysis and Environmental Modeling for the Everglades , 2013, Scientific Reports.

[28]  Jo Eidsvik,et al.  Dynamic decision making for graphical models applied to oil exploration , 2012, Eur. J. Oper. Res..

[29]  Michael C. Runge,et al.  Active adaptive management for reintroduction of an animal population , 2013 .

[30]  Olivier Buffet,et al.  Adaptive Management of Migratory Birds Under Sea Level Rise , 2013, IJCAI.

[31]  Alaa Chateauneuf,et al.  Partially Observable Markov Decision Processes incorporating epistemic uncertainties , 2015, Eur. J. Oper. Res..

[32]  Shie Mannor,et al.  Bayesian Reinforcement Learning: A Survey , 2015, Found. Trends Mach. Learn..

[33]  José G. Dias,et al.  Clustering financial time series: New insights from an extended hidden Markov model , 2015, Eur. J. Oper. Res..

[34]  Yann Dujardin,et al.  Optimization methods to solve adaptive management problems , 2016, Theoretical Ecology.