Multi-agent active information gathering in discrete and continuous-state decentralized POMDPs by policy graph improvement

Decentralized policies for information gathering are required when multiple autonomous agents are deployed to collect data about a phenomenon of interest when constant communication cannot be assumed. This is common in tasks involving information gathering with multiple independently operating sensor devices that may operate over large physical distances, such as unmanned aerial vehicles, or in communication limited environments such as in the case of autonomous underwater vehicles. In this paper, we frame the information gathering task as a general decentralized partially observable Markov decision process (Dec-POMDP). The Dec-POMDP is a principled model for co-operative decentralized multi-agent decision-making. An optimal solution of a Dec-POMDP is a set of local policies, one for each agent, which maximizes the expected sum of rewards over time. In contrast to most prior work on Dec-POMDPs, we set the reward as a non-linear function of the agents’ state information, for example the negative Shannon entropy. We argue that such reward functions are well-suited for decentralized information gathering problems. We prove that if the reward function is convex, then the finite-horizon value function of the Dec-POMDP is also convex. We propose the first heuristic anytime algorithm for information gathering Dec-POMDPs, and empirically prove its effectiveness by solving discrete problems an order of magnitude larger than previous state-of-the-art. We also propose an extension to continuous-state problems with finite action and observation spaces by employing particle filtering. The effectiveness of the proposed algorithms is verified in domains such as decentralized target tracking, scientific survey planning, and signal source localization.

[1]  Edward F. Moore,et al.  Gedanken-Experiments on Sequential Machines , 1956 .

[2]  L. Meier,et al.  Optimal control of measurement subsystems , 1967 .

[3]  J. Peschon,et al.  Optimal control of measurement subsystems , 1967, IEEE Transactions on Automatic Control.

[4]  M. Degroot Optimal Statistical Decisions , 1970 .

[5]  Edward J. Sondik,et al.  The Optimal Control of Partially Observable Markov Processes over a Finite Horizon , 1973, Oper. Res..

[6]  F. Salzano,et al.  The price of silence. , 1990, The Hastings Center report.

[7]  Neil Immerman,et al.  The Complexity of Decentralized Control of Markov Decision Processes , 2000, UAI.

[8]  Claudia V. Goldman,et al.  Optimizing information exchange in cooperative multi-agent systems , 2003, AAMAS '03.

[9]  Claudia V. Goldman,et al.  The complexity of multiagent systems: the price of silence , 2003, AAMAS '03.

[10]  David V. Pynadath,et al.  Taming Decentralized POMDPs: Towards Efficient Policy Computation for Multiagent Settings , 2003, IJCAI.

[11]  Shlomo Zilberstein,et al.  Dynamic Programming for Partially Observable Stochastic Games , 2004, AAAI.

[12]  François Charpillet,et al.  MAA*: A Heuristic Search Algorithm for Solving Decentralized POMDPs , 2005, UAI.

[13]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[14]  Morris H. DeGroot,et al.  Optimal Statistical Decisions: DeGroot/Statistical Decisions WCL , 2005 .

[15]  Nikos A. Vlassis,et al.  Decentralized planning under uncertainty for teams of communicating agents , 2006, AAMAS '06.

[16]  Shlomo Zilberstein,et al.  Memory-Bounded Dynamic Programming for DEC-POMDPs , 2007, IJCAI.

[17]  Brahim Chaib-draa,et al.  Parallel Rollout for Online Solution of Dec-POMDPs , 2008, FLAIRS.

[18]  Nikos A. Vlassis,et al.  Optimal and Approximate Q-value Functions for Decentralized POMDPs , 2008, J. Artif. Intell. Res..

[19]  Andreas Krause,et al.  Near-Optimal Sensor Placements in Gaussian Processes: Theory, Efficient Algorithms and Empirical Studies , 2008, J. Mach. Learn. Res..

[20]  Shimon Whiteson,et al.  Exploiting locality of interaction in factored Dec-POMDPs , 2008, AAMAS.

[21]  Shlomo Zilberstein,et al.  Achieving goals in decentralized POMDPs , 2009, AAMAS.

[22]  Makoto Yokoo,et al.  DCOPs meet the realworld: exploring unknown reward matrices with applications to mobile sensor networks , 2009, IJCAI 2009.

[23]  Martin Allen,et al.  Complexity of Decentralized Control: Special Cases , 2009, NIPS.

[24]  Makoto Yokoo,et al.  When should there be a "Me" in "Team"?: distributed multi-agent optimization under uncertainty , 2010, AAMAS.

[25]  Olivier Buffet,et al.  A POMDP Extension with Belief-dependent Rewards , 2010, NIPS.

[26]  Feng Wu,et al.  Rollout Sampling Policy Iteration for Decentralized POMDPs , 2010, UAI.

[27]  Jaakko Peltonen,et al.  Periodic Finite State Controllers for Efficient POMDP and DEC-POMDP Planning , 2011, NIPS.

[28]  Makoto Yokoo,et al.  Distributed on-Line Multi-Agent Optimization under Uncertainty: Balancing Exploration and Exploitation , 2011, Adv. Complex Syst..

[29]  Feng Wu,et al.  Online planning for multi-agent systems with bounded communication , 2011, Artif. Intell..

[30]  Alfred O. Hero,et al.  Sensor Management: Past, Present, and Future , 2011, IEEE Sensors Journal.

[31]  Jan Vondrák,et al.  Maximizing a Monotone Submodular Function Subject to a Matroid Constraint , 2011, SIAM J. Comput..

[32]  Mac Schwager,et al.  Distributed robotic sensor networks: An information-theoretic approach , 2012, Int. J. Robotics Res..

[33]  Kevin P. Murphy,et al.  Machine learning - a probabilistic perspective , 2012, Adaptive computation and machine learning series.

[34]  Geoffrey A. Hollinger,et al.  Multirobot Coordination With Periodic Connectivity: Theory and Experiments , 2012, IEEE Transactions on Robotics.

[35]  Frans A. Oliehoek,et al.  Sufficient Plan-Time Statistics for Decentralized POMDPs , 2013, IJCAI.

[36]  Charles L. Isbell,et al.  Point Based Value Iteration with Optimal Belief Compression for Dec-POMDPs , 2013, NIPS.

[37]  Aníbal Ollero,et al.  Decentralized multi-robot cooperation with auctioned POMDPs , 2013, Int. J. Robotics Res..

[38]  Simo Srkk,et al.  Bayesian Filtering and Smoothing , 2013 .

[39]  Shimon Whiteson,et al.  Incremental Clustering and Expansion for Faster Optimal Planning in Dec-POMDPs , 2013, J. Artif. Intell. Res..

[40]  Simo Särkkä,et al.  Bayesian Filtering and Smoothing , 2013, Institute of Mathematical Statistics textbooks.

[41]  Olivier Buffet,et al.  Optimally Solving Dec-POMDPs as Continuous-State MDPs , 2013, IJCAI.

[42]  Hoong Chuin Lau,et al.  Decentralized Multi-Agent Reinforcement Learning in Average-Reward Dynamic DCOPs , 2014, AAAI.

[43]  T. Javidi,et al.  Social learning and distributed hypothesis testing , 2014, 2014 IEEE International Symposium on Information Theory.

[44]  George J. Pappas,et al.  Technical Report: Distributed Algorithms for Stochastic Source Seeking with Mobile Robot Networks , 2014, ArXiv.

[45]  Pedro U. Lima,et al.  Decision-theoretic planning under uncertainty with information rewards for active cooperative perception , 2014, Autonomous Agents and Multi-Agent Systems.

[46]  Steven Okamoto,et al.  Distributed constraint optimization for teams of mobile sensing agents , 2014, Autonomous Agents and Multi-Agent Systems.

[47]  Dec-POMDPs as Non-Observable MDPs , 2014 .

[48]  George J. Pappas,et al.  Decentralized active information acquisition: Theory and application to multi-robot SLAM , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[49]  Frans A. Oliehoek,et al.  A Concise Introduction to Decentralized POMDPs , 2016, SpringerBriefs in Intelligent Systems.

[50]  Tara Javidi,et al.  Learning via active hypothesis testing over networks , 2017, 2017 IEEE Information Theory Workshop (ITW).

[51]  Simone Frintrop,et al.  Multi-robot active information gathering with periodic communication , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[52]  Jonathan P. How,et al.  Decentralized control of multi-robot partially observable Markov decision processes using belief space macro-actions , 2017, Int. J. Robotics Res..

[53]  Vijay Kumar,et al.  Anytime Planning for Decentralized Multirobot Active Information Gathering , 2018, IEEE Robotics and Automation Letters.

[54]  Bahman Gharesifard,et al.  Distributed Submodular Maximization With Limited Information , 2017, IEEE Transactions on Control of Network Systems.

[55]  Enrico Pontelli,et al.  Distributed Constraint Optimization Problems and Applications: A Survey , 2016, J. Artif. Intell. Res..

[56]  Shimon Whiteson,et al.  Exploiting submodular value functions for scaling up active perception , 2017, Autonomous Robots.

[57]  Olivier Buffet,et al.  rho-POMDPs have Lipschitz-Continuous epsilon-Optimal Value Functions , 2018, NeurIPS.

[58]  John K. Tsotsos,et al.  Revisiting active perception , 2016, Autonomous Robots.

[59]  Nathan Michael,et al.  Distributed matroid-constrained submodular maximization for multi-robot exploration: theory and practice , 2018, Auton. Robots.

[60]  Timothy Patten,et al.  Dec-MCTS: Decentralized planning for multi-robot active perception , 2019, Int. J. Robotics Res..

[61]  Jan Peters,et al.  Information Gathering in Decentralized POMDPs by Policy Graph Improvement , 2019, AAMAS.