A Probabilistic Framework for Matching Temporal Trajectories: CONDENSATION-Based Recognition of Gestures and Expressions

The recognition of human gestures and facial expressions in imagesequences isan important and challenging problem that enables a hostof human-computer interaction applications. Thispaperdescribes a frameworkforincrementalrecognitionofhumanmotionthatextendsthe \Condensation" algorithm proposed by Isard and Blake (ECCV'96). Human motions are modeled astemporal trajectoriesof some estimated parameters over time. The Condensationalgorithm uses random sam- pling techniques to incrementally match the trajectory models to the multi-variate input data. The recognition framework is demonstrated withtwoexamples.Therstexampleinvolvesanaugmentedocewhite- boardwithwhichausercanmakesimplehandgesturestograbregionsof theboard,printthem,savethem,etc.Thesecondexampleillustratesthe recognition of human facial expressions using the estimated parameters of a learned model of mouth motion.

[1]  Michael Isard,et al.  Learning to Track the Visual Motion of Contours , 1995, Artif. Intell..

[2]  Michael J. Black,et al.  The Robust Estimation of Multiple Motions: Parametric and Piecewise-Smooth Flow Fields , 1996, Comput. Vis. Image Underst..

[3]  David C. Hogg,et al.  Learning Flexible Models from Image Sequences , 1994, ECCV.

[4]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[5]  Aaron F. Bobick,et al.  Recognition and interpretation of parametric gesture , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[6]  Quentin Stafford-Fraser,et al.  BrightBoard: a video-augmented environment , 1996, CHI '96.

[7]  David Sankoff,et al.  Time Warps, String Edits, and Macromolecules: The Theory and Practice of Sequence Comparison , 1983 .

[8]  Andrew Blake,et al.  Learning Dynamics of Complex Motions from Image Sequences , 1996, ECCV.

[9]  Lawrence R. Rabiner,et al.  A tutorial on Hidden Markov Models , 1986 .

[10]  Justine Cassell,et al.  Temporal classification of natural gesture and application to video coding , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[11]  Thad Starner,et al.  Visual Recognition of American Sign Language Using Hidden Markov Models. , 1995 .

[12]  Stuart J. Russell,et al.  Stochastic simulation algorithms for dynamic probabilistic networks , 1995, UAI.

[13]  Richard Szeliski,et al.  Image mosaicing for tele-reality applications , 1994, Proceedings of 1994 IEEE Workshop on Applications of Computer Vision.

[14]  Maxwell Bodenheim,et al.  To I. B. , 1917 .

[15]  Andrew Blake,et al.  Real-Time Lip Tracking for Audio-Visual Speech Recognition Applications , 1996, ECCV.

[16]  Michael Isard,et al.  A mixed-state condensation tracker with automatic model-switching , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[17]  Hiroshi Ishii,et al.  Tangible bits: towards seamless interfaces between people, bits and atoms , 1997, CHI.

[18]  Michael J. Black,et al.  Parameterized Modeling and Recognition of Activities , 1999, Comput. Vis. Image Underst..

[19]  Christoph Bregler,et al.  Learning and recognizing human dynamics in video sequences , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[20]  Joseph B. Kruskal,et al.  Time Warps, String Edits, and Macromolecules , 1999 .

[21]  David J. Fleet,et al.  Learning parameterized models of image motion , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[22]  James L. Crowley,et al.  Multi-modal tracking of faces for video communications , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[23]  Timothy F. Cootes,et al.  Automatic interpretation of human faces and hand gestures using flexible models. , 1995 .

[24]  Michael Isard,et al.  Contour Tracking by Stochastic Propagation of Conditional Density , 1996, ECCV.