Interactive Learning in Continuous Multimodal Space: A Bayesian Approach to Action-Based Soft Partitioning and Learning

A probabilistic framework for interactive learning in continuous and multimodal perceptual spaces is proposed. In this framework, the agent learns the task along with adaptive partitioning of its multimodal perceptual space. The learning process is formulated in a Bayesian reinforcement learning setting to facilitate the adaptive partitioning. The partitioning is gradually and softly done using Gaussian distributions. The parameters of distributions are adapted based on the agent's estimate of its actions' expected values. The probabilistic nature of the method results in experience generalization in addition to robustness against uncertainty and noise. To benefit from experience generalization diversity in different perceptual subspaces, the learning is performed in multiple perceptual subspaces-including the original space-in parallel. In every learning step, the policies learned in the subspaces are fused to select the final action. This concurrent learning in multiple spaces and the decision fusion result in faster learning, possibility of adding and/or removing sensors-i.e., gradual expansion or contraction of the perceptual space-, and appropriate robustness against probable failure of or ambiguity in the data of sensors. Results of two sets of simulations in addition to some experiments are reported to demonstrate the key properties of the framework.

[1]  Aude Billard,et al.  Learning human arm movements by imitation: : Evaluation of a biologically inspired connectionist architecture , 2000, Robotics Auton. Syst..

[2]  Kenji Doya,et al.  Reinforcement Learning in Continuous Time and Space , 2000, Neural Computation.

[3]  G Roup,et al.  Survey: Probabilistic Methodology and Techniques for Artefact Conception and Development , 2002 .

[4]  Stephan Pareigis,et al.  Adaptive Choice of Grid and Time in Reinforcement Learning , 1997, NIPS.

[5]  Sridhar Mahadevan,et al.  Automatic Programming of Behavior-Based Robots Using Reinforcement Learning , 1991, Artif. Intell..

[6]  Stephen Hart,et al.  Learning Generalizable Control Programs , 2011, IEEE Transactions on Autonomous Mental Development.

[7]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[8]  Francesco Mondada,et al.  The e-puck, a Robot Designed for Education in Engineering , 2009 .

[9]  Majid Nili Ahmadabadi,et al.  Expertness based cooperative Q-learning , 2002, IEEE Trans. Syst. Man Cybern. Part B.

[10]  Joshua B. Tenenbaum,et al.  Bayesian Modeling of Human Concept Learning , 1998, NIPS.

[11]  L. E. ParkerCenter Learning in Large Cooperative Multi-Robot Domains , 2001 .

[12]  S. Whiteson,et al.  Adaptive Tile Coding for Value Function Approximation , 2007 .

[13]  Michael I. Jordan,et al.  Reinforcement Learning with Soft State Aggregation , 1994, NIPS.

[14]  José del R. Millán,et al.  Continuous-Action Q-Learning , 2002, Machine Learning.

[15]  G. Rizzolatti,et al.  Neural Circuits Underlying Imitation Learning of Hand Actions An Event-Related fMRI Study , 2004, Neuron.

[16]  Minoru Asada,et al.  Action-based sensor space categorization for robot learning , 1996, Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems. IROS '96.

[17]  L. Fogassi,et al.  Audiovisual mirror neurons and action recognition , 2003, Experimental Brain Research.

[18]  Julien Diard,et al.  Bayesian Robot Programming , 2004, Auton. Robots.

[19]  David Maxwell Chickering,et al.  A Transformational Characterization of Equivalent Bayesian Network Structures , 1995, UAI.

[20]  David G. Stork,et al.  Pattern Classification (2nd ed.) , 1999 .

[21]  Pierre Bessière,et al.  Proscriptive Bayesian programming application for collision avoidance , 2003, Proceedings 2003 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2003) (Cat. No.03CH37453).

[22]  K. Dautenhahn,et al.  The Mirror System, Imitation, and the Evolution of Language , 1999 .

[23]  A. Persson Using Temporal Difference Methods in Combination with Artificial Neural Networks to Solve Strategic Control Problems , 2004 .

[24]  Lionel Jouffe,et al.  Fuzzy inference system learning by reinforcement methods , 1998, IEEE Trans. Syst. Man Cybern. Part C.

[25]  J. Kruschke,et al.  A model of probabilistic category learning. , 1999, Journal of experimental psychology. Learning, memory, and cognition.

[26]  S. Vijayakumar,et al.  Competitive-Cooperative-Concurrent Reinforcement Learning with Importance Sampling , 2004 .

[27]  Hamid R. Berenji,et al.  Learning and tuning fuzzy logic controllers through reinforcements , 1992, IEEE Trans. Neural Networks.

[28]  Sebastian Thrun,et al.  Issues in Using Function Approximation for Reinforcement Learning , 1999 .

[29]  C. Priebe Adaptive Mixtures , 2010 .

[30]  Nicholas Bambos,et al.  A fuzzy reinforcement learning approach to power control in wireless transmitters , 2005, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[31]  David G. Stork,et al.  Pattern Classification , 1973 .

[32]  H. R. Berenji,et al.  Fuzzy Q-learning for generalization of reinforcement learning , 1996, Proceedings of IEEE 5th International Fuzzy Systems.

[33]  Richard E. Neapolitan,et al.  Learning Bayesian networks , 2007, KDD '07.

[34]  Majid Nili Ahmadabadi,et al.  Heterogeneous and Hierarchical Cooperative Learning via Combining Decision Trees , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[35]  M. Arbib,et al.  Language within our grasp , 1998, Trends in Neurosciences.

[36]  Andrew James Smith,et al.  Applications of the self-organising map to reinforcement learning , 2002, Neural Networks.

[37]  Pierre Bessiere,et al.  Probabilistic Methodology and Techniques for Artefact Conception and Development , 2003 .

[38]  John K. Kruschke,et al.  A model of probabilistic category learning. , 1999, Journal of experimental psychology. Learning, memory, and cognition.

[39]  Richard S. Sutton,et al.  Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.

[40]  Majid Nili Ahmadabadi,et al.  Knowledge-based Extraction of Area of Expertise for Cooperation in Learning , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[41]  Fernando Fernández,et al.  VQQL. Applying Vector Quantization to Reinforcement Learning , 1999, RoboCup.

[42]  Hossein Mobahi,et al.  A BIOLOGICALLY INSPIRED METHOD FOR CONCEPTUAL IMITATION USING REINFORCEMENT LEARNING , 2007, Appl. Artif. Intell..

[43]  T. Zentall,et al.  Categorization, concept learning, and behavior analysis: an introduction. , 2002, Journal of the experimental analysis of behavior.