Representing visual schemas in neural networks for scene analysis

Using object recognition in simple scenes as the task, two fundamental problems in neural network systems are addressed: (1) processing large amounts of input with limited resources, and (2) the representation and use of structured knowledge. The solution to the first problem is to process a small amount of the input in parallel, and successively focus on other parts of the input. This strategy requires that the system maintains structured knowledge for describing and interpreting successively gathered information. The proposed system, VISOR (Visual Schemas for Object Representation), consists of two main modules. The low-level visual module extracts featural and positional information from the visual input. The schema module encodes structured knowledge about possible objects, and provides top-down information for the low-level visual module to focus attention at different parts of the scene. Working cooperatively with the low-level visual module, it builds a globally consistent interpretation of successively gathered visual information.<<ETX>>