Using an unsupervised learning procedure, a network is trained on an ensemble of images of the same two-dimensional object at different positions, orientations and sizes. Each half of the network "sees" one fragment of the object, and tries to produce as output a set of 4 parameters that have high mutual information with the 4 parameters output by the other half of the network. Given the ensemble of training patterns, the 4 parameters on which the two halves of the network can agree are the position, orientation, and size of the whole object, or some recoding of them. After training, the network can reject instances of other shapes by using the fact that the predictions made by its two halves disagree. If two competing networks are trained on an unlabelled mixture of images of two objects, they cluster the training cases on the basis of the objects' shapes, independently of the position, orientation, and size.
Lawrence G. Roberts,et al.
Machine Perception of Three-Dimensional Solids
Outstanding Dissertations in the Computer Sciences.
Dana H. Ballard,et al.
Generalizing the Hough transform to detect arbitrary shapes
Geoffrey E. Hinton.
A Parallel Computation that Assigns Canonical Object-Based Frames of Reference
Geoffrey E. Hinton,et al.
TRAFFIC: Recognizing Objects Using Hierarchical Reference Frame Transformations