Information Gathering in Decentralized POMDPs by Policy Graph Improvement

Decentralized policies for information gathering are required when multiple autonomous agents are deployed to collect data about a phenomenon of interest without the ability to communicate. Decentralized partially observable Markov decision processes (Dec-POMDPs) are a general, principled model well-suited for such decentralized multiagent decision-making problems. In this paper, we investigate Dec-POMDPs for decentralized information gathering problems. An optimal solution of a Dec-POMDP maximizes the expected sum of rewards over time. To encourage information gathering, we set the reward as a function of the agents' state information, for example the negative Shannon entropy. We prove that if the reward is convex, then the finite-horizon value function of the corresponding Dec-POMDP is also convex. We propose the first heuristic algorithm for information gathering Dec-POMDPs, and empirically prove its effectiveness by solving problems an order of magnitude larger than previous state-of-the-art.

[1]  Shlomo Zilberstein,et al.  Dynamic Programming for Partially Observable Stochastic Games , 2004, AAAI.

[2]  Shimon Whiteson,et al.  Exploiting locality of interaction in factored Dec-POMDPs , 2008, AAMAS.

[3]  Frans A. Oliehoek,et al.  A Concise Introduction to Decentralized POMDPs , 2016, SpringerBriefs in Intelligent Systems.

[4]  Kevin P. Murphy,et al.  Machine learning - a probabilistic perspective , 2012, Adaptive computation and machine learning series.

[5]  Shimon Whiteson,et al.  Incremental Clustering and Expansion for Faster Optimal Planning in Dec-POMDPs , 2013, J. Artif. Intell. Res..

[6]  Shlomo Zilberstein,et al.  Memory-Bounded Dynamic Programming for DEC-POMDPs , 2007, IJCAI.

[7]  Feng Wu,et al.  Online planning for multi-agent systems with bounded communication , 2011, Artif. Intell..

[8]  Simone Frintrop,et al.  Multi-robot active information gathering with periodic communication , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[9]  Charles L. Isbell,et al.  Point Based Value Iteration with Optimal Belief Compression for Dec-POMDPs , 2013, NIPS.

[10]  Olivier Buffet,et al.  Optimally Solving Dec-POMDPs as Continuous-State MDPs , 2013, IJCAI.

[11]  Frans A. Oliehoek,et al.  Sufficient Plan-Time Statistics for Decentralized POMDPs , 2013, IJCAI.

[12]  Martin Allen,et al.  Complexity of Decentralized Control: Special Cases , 2009, NIPS.

[13]  Shlomo Zilberstein,et al.  Achieving goals in decentralized POMDPs , 2009, AAMAS.

[14]  Shimon Whiteson,et al.  Exploiting submodular value functions for scaling up active perception , 2017, Autonomous Robots.

[15]  Pedro U. Lima,et al.  Decision-theoretic planning under uncertainty with information rewards for active cooperative perception , 2014, Autonomous Agents and Multi-Agent Systems.

[16]  Olivier Buffet,et al.  rho-POMDPs have Lipschitz-Continuous epsilon-Optimal Value Functions , 2018, NeurIPS.

[17]  M. Degroot Optimal Statistical Decisions , 1970 .

[18]  Neil Immerman,et al.  The Complexity of Decentralized Control of Markov Decision Processes , 2000, UAI.

[19]  François Charpillet,et al.  MAA*: A Heuristic Search Algorithm for Solving Decentralized POMDPs , 2005, UAI.

[20]  Nikos A. Vlassis,et al.  The Cross-Entropy Method for Policy Search in Decentralized POMDPs , 2008, Informatica.

[21]  Nikos A. Vlassis,et al.  Decentralized planning under uncertainty for teams of communicating agents , 2006, AAMAS '06.

[22]  Jaakko Peltonen,et al.  Periodic Finite State Controllers for Efficient POMDP and DEC-POMDP Planning , 2011, NIPS.

[23]  Makoto Yokoo,et al.  Taming Decentralized POMDPs: Towards Efficient Policy Computation for Multiagent Settings , 2003, IJCAI.

[24]  Edward J. Sondik,et al.  The Optimal Control of Partially Observable Markov Processes over a Finite Horizon , 1973, Oper. Res..

[25]  Olivier Buffet,et al.  A POMDP Extension with Belief-dependent Rewards , 2010, NIPS.

[26]  Vijay Kumar,et al.  Anytime Planning for Decentralized Multirobot Active Information Gathering , 2018, IEEE Robotics and Automation Letters.

[27]  Vijay Kumar,et al.  Approximate representations for multi-robot control policies that maximize mutual information , 2014, Robotics: Science and Systems.