Are People Successful at Learning Sequences of Actions on a Perceptual Matching Task?

We report the results of an experiment in which human subjects were trained to perform a perceptual matching task. Subjects were asked to manipulate comparison objects until they matched target objects using the fewest manipulations possible. An unusual feature of the experimental task is that efficient performance requires an understanding of the hidden or latent causal structure governing the relationships between actions and perceptual outcomes. We use two benchmarks to evaluate the quality of subjects' learning. One benchmark is based on optimal performance as calculated by a dynamic programming procedure. The other is based on an adaptive computational agent that uses a reinforcement-learning method known as Q-learning to learn to perform the task. Our analyses suggest that subjects were successful learners. In particular, they learned to perform the perceptual matching task in a near-optimal manner (i.e., using a small number of manipulations) at the end of training. Subjects were able to achieve near-optimal performance because they learned, at least partially, the causal structure underlying the task. In addition, subjects' performances were broadly consistent with those of model-based reinforcement-learning agents that built and used internal models of how their actions influenced the external environment. We hypothesize that people will achieve near-optimal performances on tasks requiring sequences of action-especially sensorimotor tasks with underlying latent causal structures-when they can detect the effects of their actions on the environment, and when they can represent and reason about these effects using an internal mental model.

[1]  R. Bellman Dynamic programming. , 1957, Science.

[2]  G. Miller,et al.  Cognitive science. , 1981, Science.

[3]  R. Mathews,et al.  Insight without Awareness: On the Interaction of Verbalization, Instruction and Practice in a Simulated Process Control Task , 1989 .

[4]  C. Atkeson,et al.  Prioritized Sweeping: Reinforcement Learning with Less Data and Less Time , 1993, Machine Learning.

[5]  John R. Anderson,et al.  The Adaptive Character of Thought , 1990 .

[6]  Andrew McCallum,et al.  Overcoming Incomplete Perception with Utile Distinction Memory , 1993, ICML.

[7]  Peter Dayan,et al.  A Neural Substrate of Prediction and Reward , 1997, Science.

[8]  D. Plaut,et al.  Learning in Dynamic Decision Tasks: Computational Model and Empirical Evidence , 1997 .

[9]  Rajesh P. N. Rao,et al.  Embodiment is the foundation, not a level , 1996, Behavioral and Brain Sciences.

[10]  Thomas H. Cormen,et al.  Introduction to algorithms [2nd ed.] , 2001 .

[11]  D. Shanks,et al.  A Re-examination of Probability Matching and Rational Choice , 2002 .

[12]  W. Geisler Ideal Observer Analysis , 2002 .

[13]  David S. Touretzky,et al.  Long-Term Reward Prediction in TD Models of the Dopamine System , 2002, Neural Computation.

[14]  J. Gielis A generic geometric transformation that unifies a wide range of natural and abstract shapes. , 2003, American journal of botany.

[15]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[16]  E. Todorov Optimality principles in sensorimotor control , 2004, Nature Neuroscience.

[17]  Andrew W. Moore,et al.  Prioritized Sweeping: Reinforcement Learning with Less Data and Less Time , 1993, Machine Learning.

[18]  Wayne D. Gray,et al.  Melioration Despite More Information: The Role of Feedback Frequency in Stable Suboptimal Performance , 2005 .

[19]  Cleotilde Gonzalez,et al.  The use of microworlds to study dynamic decision making , 2005, Comput. Hum. Behav..

[20]  Dana H. Ballard,et al.  Learning to perceive and act by trial and error , 1991, Machine Learning.

[21]  R. Sun,et al.  The interaction of the explicit and the implicit in skill learning: a dual-process approach. , 2005, Psychological review.

[22]  P. Dayan,et al.  Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control , 2005, Nature Neuroscience.

[23]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[24]  Warren B. Powell,et al.  Handbook of Learning and Approximate Dynamic Programming , 2006, IEEE Transactions on Automatic Control.

[25]  John R. Anderson,et al.  From recurrent choice to skill learning: a reinforcement-learning model. , 2006, Journal of experimental psychology. General.

[26]  Wayne D. Gray,et al.  Melioration Dominates Maximization: Stable Suboptimal Performance Despite Global Feedback , 2006 .

[27]  P. Dayan,et al.  Cortical substrates for exploratory decisions in humans , 2006, Nature.

[28]  Michael D. Lee,et al.  A Hierarchical Bayesian Model of Human Decision-Making on an Optimal Stopping Problem , 2006, Cogn. Sci..

[29]  Wayne D. Gray,et al.  The soft constraints hypothesis: a rational analysis approach to resource allocation for interactive behavior. , 2006, Psychological review.

[30]  Gordon E Legge,et al.  Lost in virtual space: studies in human and ideal spatial navigation. , 2006, Journal of experimental psychology. Human perception and performance.

[31]  Robert A Jacobs,et al.  Near-Optimal Human Adaptive Control across Different Noise Environments , 2006, The Journal of Neuroscience.

[32]  A. Gopnik,et al.  Causal learning : psychology, philosophy, and computation , 2007 .

[33]  B. Love,et al.  Short-term gains, long-term pains: How cues about state aid learning in dynamic environments , 2009, Cognition.

[34]  Timothy J. Pleskac,et al.  Theoretical tools for understanding and aiding dynamic decision making , 2009 .

[35]  Bradley C Love,et al.  Learning in Noise: Dynamic Decision-Making in a Variable Environment. , 2009, Journal of mathematical psychology.

[36]  P. Dayan,et al.  States versus Rewards: Dissociable Neural Prediction Error Signals Underlying Model-Based and Model-Free Reinforcement Learning , 2010, Neuron.