This Excerpt from Reinforcement Learning. Introduction 1.2 Examples 1.3 Elements of Reinforcement Learning 1.3 Elements of Reinforcement Learning

is provided in screen-viewable form for personal use only by members of MIT CogNet. Unauthorized use or dissemination of this information is expressly forbidden. If you have any questions about this material, please contact The idea that we learn by interacting with our environment is probably the first to occur to us when we think about the nature of learning. When an infant plays , waves its arms , or looks about , it has no explicit teacher , but it does have a direct sensorimotor connection to its environment. Exercising this connection produces a wealth of information about cause and effect , about the consequences of actions , and about what to do in order to achieve goals. Throughout our lives , such interactions are undoubtedly a major source of knowledge about our environment and ourselves. Whether we are learning to drive a car or to hold a conversation , we are acutely aware of how our environment responds to what we do , and we seek to influence what happens through our behavior. Learning from interaction is a foundational idea underlying nearly all theories of learning and intelligence. In this book we explore a computational approach to learning from interaction. Rather than directly theorizing about how people or animals learn, we explore ide-alized learning situations and evaluate the effectiveness of various learning methods. That is , we adopt the perspective of an artificial intelligence researcher or engineer. We explore designs for machines that are effective in solving learning problems of scientific or economic interest , evaluating the designs through mathematical analysis or computational experiments. The approach we explore , called reinforcement learning , is much more focused on goal-directed learning from interaction than are other approach es to machine learning. Reinforcement learning is learning what to do-how to map situations to actions-so as to maximize a numerical reward signal. The learner is not told which actions to take, as in most forms of machine learning, but instead must discover which 1 Introduction 1.1 Reinforcement Learning Introduction actions yield the most reward by trying them. In the most interesting and challenging cases, actions may affect not only the immediate reward but also the next situation and , through that , all subsequent rewards. These two characteristics-trial-and-error search and delayed reward-are the two most important distinguishing features of reinforcement learning. Reinforcement learning is defined not by characterizing learning methods , …