论文信息 - A Theoretical Model of Behavioral Shaping

A Theoretical Model of Behavioral Shaping

A Theoretical Model of Behavioral Shaping Manu Chhabra 1 (mchhabra@cs.rochester.edu) Daniel Stefankovic 1 (stefanko@cs.rochester.edu) Robert A. Jacobs 2 (robbie@bcs.rochester.edu) Departments of Computer Science 1 and Brain & Cognitive Sciences 2 University of Rochester Rochester, NY 14627 USA Abstract This paper has two goals. The primary goal is to mathematically formalize the notion of shaping, and to show that shaping makes certain tasks easier to learn. We specifically concentrate on tasks requiring search in which the objective of an agent (either biological or ar- tificial) is to find states in a search space which are re- warded. Intuitively, we ask whether searching for a re- ward state is easier when a teacher is available to guide the search. The secondary goal is to make a method- ological contribution to the field of Cognitive Science by illustrating how concepts and techniques from the field of Computational Learning Theory (Anthony & Biggs, 1992; Kearns & Vazirani, 1994) can be used to formalize and study cognitive phenomena. A more detailed de- scription of this work, and several additional results, can be found in Chhabra, Jacobs, and Stefankovic (2007). Consider the following task in which an agent has to find a reward state in a one-dimensional array of states. There is an array of size n whose elements are filled with the values of a reward function; i.e., the elements are filled with zeros, except at some random, fixed loca- tion with index T (for target), which has a one. The agent’s goal is to find this reward location. The agent can query any element of the array. Clearly, to find the reward location, O(n) queries are needed in the worst case. Consider, now, the availability of a teacher to guide the agent’s search. The learning process proceeds in it- erations. At iteration 1, the teacher fills a contiguous region of size n/2 in the array with 1s (with the con- straint that the actual target reward location T is within this region). All other elements in the array are assigned a zero. When the agent queries an element containing a 1, the current iteration ends and the learning process moves to the next iteration. At iteration 2, the teacher reduces the region containing 1s to a contiguous subre- gion of size n/4 (the subregion at iteration 2 is a subset of the subregion at iteration 1 with the constraint that it also contains location T ). Eventually, after dlog 2 ne iterations, the teacher assigns a 1 only to element T , the actual target reward location. Assuming that the agent knows that the teacher will shrink the “reward area” by 1/2 at each iteration, it is easy to show that the re- ward location can be found with just O(log 2 n) queries. Importantly, this is an exponential improvement in per- formance relative to the case when there is no teacher available to guide the agent’s search. Behavioral shaping is an incremental training procedure commonly used to teach complex behaviors. Using this procedure, a learner is initially rewarded for produc- ing coarse approximations of the target behavior. Over time, only more refined approximations are rewarded until, finally, the learner receives reward only when the target behavior is produced. In this paper, we mathe- matically formalize the notion of behavioral shaping in the context of search problems. In a search problem, an agent uses the membership oracle of a target con- cept to find a positive example of the concept. When the concepts are intervals on the real line, we show that the use of a shaping sequence—a sequence of increas- ingly restrictive concepts leading to the target concept— exponentially decreases the number of queries required to solve the search problem. We also show that there does not exist an algorithm which can solve the search problem using a smaller number of queries. Lastly, we conjecture that convexity may be an important require- ment for a shaping procedure to be helpful. Keywords: learning; training; shaping; computational learning theory Introduction Behavioral shaping is a training procedure commonly used to teach complex behaviors. Using this procedure, a complex task is taught to a learner in an incremental manner. The learner is initially rewarded for perform- ing an easy task that coarsely resembles the target task that the teacher wants the learner to perform. Over time, the learner is rewarded for performing more dif- ficult tasks that monotonically provide better approxi- mations to the target task. At the end of the training sequence, the learner is rewarded only for performing the target task. Shaping was first proposed by B. F. Skinner in the 1930s (Skinner, 1938). In one experiment, Skin- ner demonstrated that pigeons could be trained to move in a circle. Initially, any movement to the left was re- warded. When the pigeon acquired this behavior, only larger movements were rewarded, and so on. Eventu- ally, the pigeon learned to move in a full circle. In recent decades, shaping has been used to train animals (includ- ing people) to perform tasks that they will not learn to perform through direct reinforcement (i.e., by only rewarding the target behavior). Shaping has also been used in the field of artificial intelligence, especially in the area of machine learning known as reinforcement learn- ing (Sutton & Barto, 1998), to train agents to perform complex tasks (Dorgio & Colombetti, 1994; Ng, Harada, & Russell, 1999; Randlov, 2000).

Robert A. Jacobs | Daniel Stefankovic | Manu Chhabra

[1] Andrew Y. Ng,et al. Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.

[2] Marco Colombetti,et al. Robot Shaping: Developing Autonomous Agents Through Learning , 1994, Artif. Intell..

[3] Umesh V. Vazirani,et al. An Introduction to Computational Learning Theory , 1994 .

[4] B. Roche,et al. The Behavior of Organisms? , 1997 .

[5] Robert A. Jacobs,et al. Behavioral Shaping for Geometric Concepts , 2007, J. Mach. Learn. Res..

[6] Jette Randløv,et al. Shaping in Reinforcement Learning by Changing the Physics of the Problem , 2000, ICML.

[7] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[8] J. K. Hunter,et al. Measure Theory , 2007 .

[9] Stephen P. Boyd,et al. Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.