A theoretical study of recognition of moving objects by monocular vision

It is assumed that a human recognizes the movement and 3D-structure of an object based on the feature quantities and their changes extracted from the retinal images which are the projection of the three-dimensional object onto the retina. Human recognition of a moving object works as follows. First, primitive recognition takes place based on the change of feature quantity at the local areas corresponding to each small portion on the retina independently. This is local parallel information processing. Second, attention is shifted to the higher level of recognition which integrates all the local primitive recognition. This paper considers the local linear feature of the object. In other words, we consider the infinitesimal plane of the surface of the three-dimensional object, and we clarify the transformation of the two-dimensional image caused by the motion of the object. This transformation depends on both the motion of the object and the three-dimensional structure. Also, we clarify the law of the transformation of the linear features obtained from the image. Based on the above, we show that recognition of the motion of the object and the three-dimensional structure is possible without the need for identifying the corresponding points. In addition, we demonstrate the condition which the characteristic functions should satisfy, and the concrete computation methods to obtain the motion and the three-dimensional structure are given.

[1]  Berthold K. P. Horn,et al.  Determining Optical Flow , 1981, Other Conferences.