VISOR: Schema-based scene analysis with structured neural networks

A novel approach to object recognition and scene analysis based on neural network representation of visual schemas is described. Given an input scene, the VISOR system focuses attention successively at each component, and the schema representations cooperate and compete to match the inputs. The schema hierarchy is learned from examples through unsupervised adaptation and reinforcement learning. VISOR learns that some objects are more important than others in identifying the scene, and that the importance of spatial relations varies depending on the scene. As the inputs differ increasingly from the schemas, VISOR's recognition process is remarkably robust, and automatically generates a measure of confidence in the analysis.

[1]  Risto Miikkulainen,et al.  Priming, Perceptual Reversal, and Circular Reaction in a Neural Network Model of Schema-Based Vision , 1993, Proceedings of the Sixteenth Annual Conference of the Cognitive Science Society.

[2]  Risto Miikkulainen,et al.  Visual Schemas in Neural Networks for Object Recognition and Scene Analysis , 1997, Connect. Sci..

[3]  I. Biederman,et al.  Dynamic binding in a neural network for shape recognition. , 1992, Psychological review.

[4]  Michael A. Arbib,et al.  The metaphorical brain 2 - neural networks and beyond (2. ed.) , 1972 .

[5]  Allen R. Hanson,et al.  Computer Vision Systems , 1978 .

[6]  Thomas O. Binford,et al.  Survey of Model-Based Image Analysis Systems , 1982 .

[7]  Geoffrey E. Hinton,et al.  Schemata and Sequential Thought Processes in PDP Models , 1986 .

[8]  Stephen Grossberg,et al.  A massively parallel architecture for a self-organizing neural pattern recognition machine , 1988, Comput. Vis. Graph. Image Process..

[9]  James L. McClelland,et al.  Parallel distributed processing: explorations in the microstructure of cognition, vol. 1: foundations , 1986 .

[10]  Bruce A. Draper,et al.  The schema system , 1988, International Journal of Computer Vision.

[11]  I. Biederman Recognition-by-components: a theory of human image understanding. , 1987, Psychological review.

[12]  Geoffrey E. Hinton Mapping Part-Whole Hierarchies into Connectionist Networks , 1990, Artif. Intell..

[13]  Wilson S. Geisler,et al.  COMPUTATIONAL TEXTURE ANALYSIS USING LOCALIZED SPATIAL FILTERING. , 1987 .

[14]  J. Feldman Four frames suffice: A provisional model of vision and space , 1985, Behavioral and Brain Sciences.