Machine Recognition of Objects

Definition Machine recognition of objects is the task of locating and recognizing a given object in an image and consists of the following steps: object detection, feature extraction, and recognition. Background Early computer vision recognition schemes focused primarily on the recognition of rigid three-dimensional (3D) objects, such as machine parts, tools, and cars. This is a challenging problem because the same object can have markedly different appearances when viewed from different directions. It proved possible to deal successfully with this difficulty by using detailed 3D models of the viewed objects, which were compared with the projected 2D image (e.g., [14, 18, 33]). Over the last decade or so, computational models have made significant progress in the task of recognizing natural object categories under realistic, relatively uncon-strained viewing conditions. Within object recognition, it is common to distinguish two main tasks: identification , for instance, recognizing a specific face among other faces, and categorization, for example, recognizing a car among other object classes. We will discuss both of these tasks below and use " recognition " to include both. The qualitative improvement in the performance of recognition models can be attributed to three main components. The first is the use of extensive learning in constructing recognition models. In this framework, rather than specifying a particular model, the scheme starts with a large family of possible models and uses observed examples to guide the construction of a specific model which is best suited to the observed data. The second component was the development of new forms of object representation for the purpose of categorization, based on both computational considerations and guidelines from known properties of the visual cortex. These two components, representation and learning, are interrelated: initially, the class representation provides a family of plausible models, and effective learning methods are then used to construct a particular model for a novel class such as " dog " or " airplane " based on observed examples. The third component was the use of new statistical learning techniques, such as regularization classifiers (SVM and others) and Bayesian inference (such as graphi-cal models). We next discuss each of these advances in more detail. M 470 Machine Recognition of Objects classification problem from examples, rather than focusing on the classifier design. This was a marked departure from the dominant practices at the time: instead of an expert program with a predetermined set of logical rules, the appropriate …

[1]  D. Marr,et al.  Representation and recognition of the spatial organization of three-dimensional shapes , 1978, Proceedings of the Royal Society of London. Series B. Biological Sciences.

[2]  N. Sloane,et al.  Hadamard transform optics , 1979 .

[3]  Irving Biederman,et al.  Human image understanding: Recent research and a theory , 1985, Comput. Vis. Graph. Image Process..

[4]  David G. Lowe,et al.  Three-Dimensional Object Recognition from Single Two-Dimensional Images , 1987, Artif. Intell..

[5]  M. Turk,et al.  Eigenfaces for Recognition , 1991, Journal of Cognitive Neuroscience.

[6]  Ronen Basri,et al.  Recognition by Linear Combinations of Models , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[7]  Shree K. Nayar Shape from focus system , 1992, Proceedings 1992 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[8]  Roberto Brunelli,et al.  Face Recognition: Features Versus Templates , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[9]  Aaron F. Bobick,et al.  A state-based technique for the summarization and recognition of gesture , 1995, Proceedings of IEEE International Conference on Computer Vision.

[10]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[11]  Marc Levoy,et al.  A volumetric method for building complex models from range images , 1996, SIGGRAPH.

[12]  James F. Blinn,et al.  Blue screen matting , 1996, SIGGRAPH.

[13]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[14]  Alex Pentland,et al.  Coupled hidden Markov models for complex action recognition , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[15]  Takeo Kanade,et al.  Virtualized Reality: Constructing Virtual Worlds from Real Scenes , 1997, IEEE Multim..

[16]  Tomaso A. Poggio,et al.  Example-Based Learning for View-Based Human Face Detection , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[17]  Theodoros Evgeniou,et al.  A TRAINABLE PEDESTRIAN DETECTION SYSTEM , 1998 .

[18]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[19]  Takeo Kanade,et al.  Neural Network-Based Face Detection , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[20]  Tomaso A. Poggio,et al.  A general framework for object detection , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[21]  Michael I. Jordan Graphical Models , 1998 .

[22]  Shimon Ullman,et al.  Combining Class-Specific Fragments for Object Classification , 1999, BMVC.

[23]  Subhasis Chaudhuri,et al.  Depth From Defocus: A Real Aperture Imaging Approach , 1999, Springer New York.

[24]  Carlo Tomasi,et al.  Alpha estimation in natural images , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[25]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[26]  David Salesin,et al.  A Bayesian approach to digital matting , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[27]  Tomaso A. Poggio,et al.  Example-Based Object Detection in Images by Components , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[28]  M. Alex O. Vasilescu,et al.  Recognizing action events from multiple viewpoints , 2001, Proceedings IEEE Workshop on Detection and Recognition of Events in Video.

[29]  Felipe Cucker,et al.  On the mathematical foundations of learning , 2001 .

[30]  D. Sutherland The evolution of clinical gait analysis. Part II kinematics. , 2002, Gait & posture.

[31]  Michel Vidal-Naquet,et al.  Visual features of intermediate complexity and their use in classification , 2002, Nature Neuroscience.

[32]  Tai Sing Lee,et al.  Hierarchical Bayesian inference in the visual cortex. , 2003, Journal of the Optical Society of America. A, Optics, image science, and vision.

[33]  Rama Chellappa,et al.  View invariants for human action recognition , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[34]  Pietro Perona,et al.  A Bayesian approach to unsupervised one-shot learning of object categories , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[35]  T. Poggio,et al.  General conditions for predictivity in learning theory , 2004, Nature.

[36]  Marc Levoy,et al.  Synthetic aperture confocal imaging , 2004, ACM Trans. Graph..

[37]  Mubarak Shah,et al.  View-Invariant Representation and Recognition of Actions , 2002, International Journal of Computer Vision.

[38]  Yaser Sheikh,et al.  On the use of anthropometry in the invariant analysis of human actions , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[39]  Carlos Hernández Esteban,et al.  Modélisation d'objets 3D par fusion silhouettes-stéréo à partir de séquences d'images en rotation non calibrées. (Stereo and silhouette fusion for 3D object modeling from uncalibrated images under circular motion) , 2004 .

[40]  Takeo Kanade,et al.  Object Detection Using the Statistics of Parts , 2004, International Journal of Computer Vision.

[41]  Mark R. Stevens,et al.  Methods for Volumetric Reconstruction of Visual Scenes , 2004, International Journal of Computer Vision.

[42]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[43]  Jian Sun,et al.  Poisson matting , 2004, ACM Trans. Graph..

[44]  David Salesin,et al.  Interactive digital photomontage , 2004, ACM Trans. Graph..

[45]  Pietro Perona,et al.  Learning Generative Visual Models from Few Training Examples: An Incremental Bayesian Approach Tested on 101 Object Categories , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[46]  Paul A. Viola,et al.  Robust Real-Time Face Detection , 2001, International Journal of Computer Vision.

[47]  Mikhail Belkin,et al.  Semi-Supervised Learning on Riemannian Manifolds , 2004, Machine Learning.

[48]  Kiriakos N. Kutulakos,et al.  A Theory of Shape by Space Carving , 2000, International Journal of Computer Vision.

[49]  A. Wuttig Optimal transformations for optical multiplex measurements in the presence of photon noise. , 2005, Applied optics.

[50]  Michael F. Cohen,et al.  An iterative optimization approach for unified image segmentation and matting , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[51]  P. Hanrahan,et al.  Light Field Photography with a Hand-held Plenoptic Camera , 2005 .

[52]  Paul Haeberli A Multifocus Method for Controlling Depth of Field , 2005 .

[53]  T. Poggio,et al.  The Mathematics of Learning: Dealing with Data , 2005, 2005 International Conference on Neural Networks and Brain.

[54]  Richard Szeliski,et al.  A Comparison and Evaluation of Multi-View Stereo Reconstruction Algorithms , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[55]  Amit K. Agrawal,et al.  Coded exposure photography: motion deblurring using fluttered shutter , 2006, ACM Trans. Graph..

[56]  Mohiuddin Ahmad,et al.  HMM-based Human Action Recognition Using Multiview Image Sequences , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[57]  Cordelia Schmid,et al.  Toward Category-Level Object Recognition , 2006, Toward Category-Level Object Recognition.

[58]  Michael Goesele,et al.  Multi-View Stereo Revisited , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[59]  Jean-Philippe Pons,et al.  Fast Level Set Multi-View Stereo on Graphics Hardware , 2006, Third International Symposium on 3D Data Processing, Visualization, and Transmission (3DPVT'06).

[60]  Adrian Hilton,et al.  A survey of advances in vision-based human motion capture and analysis , 2006, Comput. Vis. Image Underst..

[61]  Olivier D. Faugeras,et al.  Multi-View Stereo Reconstruction and Scene Flow Estimation with a Global Image-Based Matching Score , 2007, International Journal of Computer Vision.

[62]  Cordelia Schmid,et al.  Dataset Issues in Object Recognition , 2006, Toward Category-Level Object Recognition.

[63]  Yiannis Aloimonos,et al.  View-Invariant Modeling and Recognition of Human Actions Using Grammars , 2006, WDV.

[64]  Rémi Ronfard,et al.  Free viewpoint action recognition using motion history volumes , 2006, Comput. Vis. Image Underst..

[65]  Jan-Michael Frahm,et al.  Real-Time Visibility-Based Fusion of Depth Maps , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[66]  Ramakant Nevatia,et al.  Single View Human Action Recognition using Key Pose Matching and Viterbi Path Searching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[67]  Frédo Durand,et al.  Image and depth from a conventional camera with a coded aperture , 2007, ACM Trans. Graph..

[68]  Roberto Cipolla,et al.  Multiview Stereo via Volumetric Graph-Cuts and Occlusion Robust Photo-Consistency , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[69]  Roberto Cipolla,et al.  Probabilistic visibility for multi-view stereo , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[70]  Felix Goldberg,et al.  Optimal multiplexed sensing: bounds, conditions and a graph theory link. , 2007, Optics express.

[71]  Yoav Y. Schechner,et al.  Illumination Multiplexing within Fundamental Limits , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[72]  Shree K. Nayar,et al.  Multiplexing for Optimal Lighting , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[73]  Shree K. Nayar,et al.  Multispectral Imaging Using Multiplexed Illumination , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[74]  Michael Goesele,et al.  Multi-View Stereo for Community Photo Collections , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[75]  Marc Pollefeys,et al.  Multi-View Stereo via Graph Cuts on the Dual of an Adaptive Tetrahedral Mesh , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[76]  J. Ponce,et al.  Accurate, Dense, and Robust Multi-View Stereopsis , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[77]  D. Brady,et al.  Dispersion multiplexing with broadband filtering for miniature spectrometers. , 2007, Applied optics.

[78]  Jan-Michael Frahm,et al.  Real-Time Plane-Sweeping Stereo with Multiple Sweeping Directions , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[79]  Ramesh Raskar,et al.  Dappled photography: mask enhanced cameras for heterodyned light fields and coded aperture refocusing , 2007, ACM Trans. Graph..

[80]  Rémi Ronfard,et al.  Action Recognition from Arbitrary Views using 3D Exemplars , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[81]  Cordelia Schmid,et al.  Learning realistic human actions from movies , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[82]  Huosheng Hu,et al.  Human motion tracking for rehabilitation - A survey , 2008, Biomed. Signal Process. Control..

[83]  Moshe Ben-Ezra,et al.  An LED-only BRDF measurement device , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[84]  Patrick Pérez,et al.  Cross-View Action Recognition from Temporal Self-similarities , 2008, ECCV.

[85]  Mubarak Shah,et al.  Learning human actions via information maximization , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[86]  Chia-Kai Liang,et al.  Programmable aperture photography: multiplexed light field acquisition , 2008, SIGGRAPH 2008.

[87]  Charles Kemp,et al.  The discovery of structural form , 2008, Proceedings of the National Academy of Sciences.

[88]  David A. McAllester,et al.  A discriminatively trained, multiscale, deformable part model , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[89]  Ali Farhadi,et al.  Learning to Recognize Activities from the Wrong View Point , 2008, ECCV.

[90]  Pascal Fua,et al.  On benchmarking camera calibration and multi-view stereo for high resolution imagery , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[91]  Shuntaro Yamazaki,et al.  Temporal Dithering of Illumination for Fast Active Vision , 2008, ECCV.

[92]  Rama Chellappa,et al.  Statistical analysis on Stiefel and Grassmann manifolds with applications in computer vision , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[93]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[94]  Jean-Philippe Pons,et al.  Towards high-resolution large-scale multi-view stereo , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[95]  Daniel Cremers,et al.  Continuous Global Optimization in Multiview 3D Reconstruction , 2007, International Journal of Computer Vision.

[96]  Marc Pollefeys,et al.  Camera Network Calibration and Synchronization from Silhouettes in Archived Video , 2010, International Journal of Computer Vision.

[97]  Ramesh Raskar,et al.  Optimal single image capture for motion deblurring , 2009, CVPR.

[98]  Avinash C. Kak,et al.  Distributed and lightweight multi-camera human activity classification , 2009, 2009 Third ACM/IEEE International Conference on Distributed Smart Cameras (ICDSC).

[99]  Mubarak Shah,et al.  Incremental action recognition using feature-tree , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[100]  Jiebo Luo,et al.  Recognizing realistic actions from videos “in the wild” , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[101]  Andrew Zisserman,et al.  Multiple kernels for object detection , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[102]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[103]  Jean-Christophe Nebel,et al.  View and Style-Independent Action Manifolds for Human Activity Recognition , 2010, ECCV.

[104]  Richard Szeliski,et al.  Towards Internet-scale multi-view stereo , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[105]  Qiang Wu,et al.  Support vector regression for multi-view gait recognition based on local motion feature selection , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[106]  Yoav Y. Schechner,et al.  Multiplexed fluorescence unmixing , 2010, 2010 IEEE International Conference on Computational Photography (ICCP).

[107]  Jitendra Malik,et al.  Shape matching and object recognition using shape contexts , 2010, 2010 3rd International Conference on Computer Science and Information Technology.

[108]  Patrick Pérez,et al.  View-Independent Action Recognition from Temporal Self-Similarities , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[109]  Pascal Fua,et al.  Efficient large-scale multi-view stereo for ultra high-resolution image sets , 2011, Machine Vision and Applications.

[110]  Silvio Savarese,et al.  Cross-view action recognition via view knowledge transfer , 2011, CVPR 2011.

[111]  Rémi Ronfard,et al.  A survey of vision-based methods for action representation, segmentation and recognition , 2011, Comput. Vis. Image Underst..

[112]  Kiriakos N. Kutulakos,et al.  Light-Efficient Photography , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.