论文信息 - An Additive Latent Feature Model for Transparent Object Recognition

An Additive Latent Feature Model for Transparent Object Recognition

Existing methods for visual recognition based on quantized local features can perform poorly when local features exist on transparent surfaces, such as glass or plastic objects. There are characteristic patterns to the local appearance of transparent objects, but they may not be well captured by distances to individual examples or by a local pattern codebook obtained by vector quantization. The appearance of a transparent patch is determined in part by the refraction of a background pattern through a transparent medium: the energy from the background usually dominates the patch appearance. We model transparent local patch appearance using an additive model of latent factors: background factors due to scene content, and factors which capture a local edge energy distribution characteristic of the refraction. We implement our method using a novel LDA-SIFT formulation which performs LDA prior to any vector quantization step; we discover latent topics which are characteristic of particular transparent patches and quantize the SIFT space into transparent visual words according to the latent topic dimensions. No knowledge of the background scene is required at test time; we show examples recognizing transparent glasses in a domestic environment.

[1] T. Poggio,et al. Hierarchical models of object recognition in cortex , 1999, Nature Neuroscience.

[2] David Salesin,et al. Environment matting and compositing , 1999, SIGGRAPH.

[3] Edward H. Adelson,et al. On seeing stuff: the perception of materials by humans and machines , 2001, IS&T/SPIE Electronic Imaging.

[4] Jiri Matas,et al. Robust wide-baseline stereo from maximally stable extremal regions , 2004, Image Vis. Comput..

[5] Pietro Perona,et al. Object class recognition by unsupervised scale-invariant learning , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[6] Andrew Zisserman,et al. Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[7] David W. Jacobs,et al. Using specularities for recognition , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[8] Michael I. Jordan,et al. Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[9] Trevor Darrell,et al. Conditional Random Fields for Object Recognition , 2004, NIPS.

[10] Gabriela Csurka,et al. Visual categorization with bags of keypoints , 2002, eccv 2004.

[11] Thomas Hofmann,et al. Unsupervised Learning by Probabilistic Latent Semantic Analysis , 2004, Machine Learning.

[12] Mark Steyvers,et al. Finding scientific topics , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[13] Jan-Olof Eklundh,et al. Classifying Materials from Their Reflectance Properties , 2004, ECCV.

[14] Jitendra Malik,et al. Shape matching and object recognition using low distortion correspondences , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[15] Pietro Perona,et al. Learning object categories from Google's image search , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[16] Antonio Torralba,et al. Learning hierarchical models of scenes, objects, and parts , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[17] Alexei A. Efros,et al. Discovering objects and their location in images , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[18] Bill Triggs,et al. Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[19] Heinrich H. Bülthoff,et al. Low-Level Image Cues in the Perception of Translucent Materials , 2005, TAP.

[20] David A. Forsyth,et al. Finding glass , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[21] Juan Carlos Niebles,et al. Unsupervised Learning of Human Action Categories , 2006 .

[22] Jean Ponce,et al. A Geodesic Active Contour Framework for Finding Glass , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[23] Cordelia Schmid,et al. Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[24] Michael J. Black,et al. Specular Flow and the Recovery of Surface Structure , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[25] David Nistér,et al. Scalable Recognition with a Vocabulary Tree , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[26] Juan Carlos Niebles,et al. Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words , 2008, International Journal of Computer Vision.

[27] Eli Shechtman,et al. Matching Local Self-Similarities across Images and Videos , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[28] David M. Blei,et al. Supervised Topic Models , 2007, NIPS.

[29] Silvio Savarese,et al. Detecting Specular Surfaces on Natural Images , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[30] Trevor Darrell,et al. The Pyramid Match Kernel: Efficient Learning with Sets of Features , 2007, J. Mach. Learn. Res..

[31] David A. McAllester,et al. A discriminatively trained, multiscale, deformable part model , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[32] Bernt Schiele,et al. Decomposition, discovery and detection of visual categories using topic models , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[33] Chong Wang,et al. Simultaneous image classification and annotation , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[34] Matthijs C. Dorst. Distinctive Image Features from Scale-Invariant Keypoints , 2011 .