Learning with Delayed Reinforcement Through Attention-Driven Buffering

Learning with delayed reinforcement refers to situations where the reinforcement to a learning system occurs only at the end of a string of actions or outputs, and it must then be assigned back to the relevant actions. A method for accomplishing this is presented which buffers a small number of past actions based on the unpredictability of or attention to each as it occurs. This approach allows for the buffer size to be small, and yet learning can reach indefinitely far back into the past; it also allows the system to learn when reinforcement is not only delayed but also reinforcements from other unrelated actions may arrive during this delay. An example of a simulated food-finding creature is used to show the system at work in a predictive application where reinforcements show this interleaving behaviour.