Artificial 3D Vision

1 I n t roduc t i on In this talk, we discuss the following problem: suppose we have a mobile robot moving around in an otherwise unknown man made environment. is it possible to let the thing wander around and build incrementally, using passive Vision, a three dimensional representation which: 1. does not grow too large even if many measurements are accumulated (capability of "intelligently forget­ t ing"). 2. is accurate even though the motion of the robot is not accurately known (converges in time toward the "real world" description)? We have made some progress toward the solution of this problem whose applications to the field of Robotics should be obvious. This presentation is in three parts. in the first part we somewhat detail the basic assumptions and techniques that have allowed us to come up with efficient solutions to the problem of building a local 3D map from Stereo Vision and Structure from Motion. In the second part, we show how the motion of the vehicle can be computed accurately by Visual Motion based techniques. In the third part we present a purely geometric approach to the problem of combining several viewpoints into a single surface and volume representation of the environment. In the fourth part, we present a solution to the same problem that takes into account the Uncertainty in the visual measurements and the motion of the robot. 4 This work was partially supported by Esprit Project P940 In our work, we have followed two guiding lights. The first is Geometry: a Vision System is mostly a geometric engine and ours deals directly with geometric entities which are constructed from the visual inputs. The second guiding light is that of Uncertainty. Uncertainty is always present in the real world and cannot be engineered away. Therefore it must be present explicitely in the representations which are manipulated by the system. 2 Bu i ld ing 3D maps f r om passive Vis ion We exploit the fact that in a man made environment many polyhedral or closely polyhedral objects are present. There­ fore, it is natural to adopt a representation that is based on linear geometric primitives: points, lines, and planes. The line primitives are computed immediatly from the input images and then used as tokens in the Stereo and Motion Matchers that compute 3D Geometry and Motion. We have developed several Stereo matchers which all use the basic paradigm of Hypothesis Prediction and Testing to match 2D line segments [5,6] and differ by the number of cameras inputs they use. The fastest, most reliable and accurate matcher is the one using three cameras. They all use as their basic principle the idea of work­ ing not on the images themselves to establish the matches but on symbolic representations of these images. These symbolic representations are neighborhood graphs of line segments extracted from the images. The matchers can ac­ comodate any camera geometry thanks to a powerful cal­ ibration technique described in [11] from which the epipolar geometry can be easily computed. Thanks to this, they