Learning to Generate Focus Trajectories for Attentive Vision

One motivation of this paper is to provide an alternative for inefficient purely static 'neural' approaches to visual target detection. This is done by introducing a more efficient sequential approach. The latter is inspired by the observation that biological systems employ sequential eye-movements for pattern recognition. The other motivation is to demonstrate that there is at least one principle which can lead to the learning of dynamic selective spatial attention. A system consisting of an adaptive 'model network' interacting with a dynamic adaptive 'control network' is described. The system learns to generate focus trajectories such that the final position of a moving focus corresponds to a target to be detected in a visual scene. T he difficulty is that no teacher provides the desired activations of 'eye-muscles' at various times. The only goal information is the desired final input corresponding to the target. Thus the task involves a complex temporal credit assignment problem, as well as an attention shifting problem. It is demonstrated experimentally that the system is able to learn correct sequences of focus movements involving translations and rotations. The system also learns to track moving targets. Some implications for attentive systems in general are discussed. For instance, one can build a 'mental focus' which operates on the set of internal representations of a neural system. It is suggested that self-referential systems which model the consequences of their own 'mental focus shifts' open the door for introspective learning in neural networks.