ShapeLearner: Towards Shape-Based Visual Knowledge Harvesting

The deluge of images on the Web has led to a number of efforts to organize images semantically and mine visual knowledge. Despite enormous progress on categorizing entire images or bounding boxes, only few studies have targeted fine-grained image understanding at the level of specific shape contours. For instance, beyond recognizing that an image portrays a cat, we may wish to distinguish its legs, head, tail, and so on. To this end, we present ShapeLearner, a system that acquires such visual knowledge about object shapes and their parts in a semantic taxonomy, and then is able to exploit this hierarchy in order to analyze new kinds of objects that it has not observed before. ShapeLearner jointly learns this knowledge from sets of segmented images. The space of label and segmentation hypotheses is pruned and then evaluated using Integer Linear Programming. Experiments on a variety of shape classes show the accuracy and effectiveness of our method.

[1]  Jitendra Malik,et al.  Hypercolumns for object segmentation and fine-grained localization , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Vladimir Kolmogorov,et al.  Object cosegmentation , 2011, CVPR 2011.

[3]  Gang Hua,et al.  Generating Descriptive Visual Words and Visual Phrases for Large-Scale Image Applications , 2011, IEEE Transactions on Image Processing.

[4]  Gerhard Weikum,et al.  WebChild: harvesting and organizing commonsense knowledge from the web , 2014, WSDM.

[5]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[6]  Gerhard Weikum,et al.  Towards a universal wordnet by learning from combined evidence , 2009, CIKM.

[7]  Takeo Kanade,et al.  Discovering object instances from scenes of Daily Living , 2011, 2011 International Conference on Computer Vision.

[8]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[9]  Yang Song,et al.  Taxonomic classification for web-based videos , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[10]  Andrew Blake,et al.  Cosegmentation of Image Pairs by Histogram Matching - Incorporating a Global Constraint into MRFs , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[11]  Antonio Torralba,et al.  LabelMe: A Database and Web-Based Tool for Image Annotation , 2008, International Journal of Computer Vision.

[12]  Remco C. Veltkamp,et al.  A survey of content based 3D shape retrieval methods , 2004, Proceedings Shape Modeling Applications, 2004..

[13]  Lei Luo,et al.  A Computational Model of the Short-Cut Rule for 2D Shape Decomposition , 2015, IEEE Transactions on Image Processing.

[14]  Andrew Zisserman,et al.  BiCoS: A Bi-level co-segmentation method for image classification , 2011, 2011 International Conference on Computer Vision.

[15]  Changsheng Xu,et al.  Learning "verb-object" concepts for semantic image annotation , 2011, MM '11.

[16]  Gerhard Weikum,et al.  WWW 2007 / Track: Semantic Web Session: Ontologies ABSTRACT YAGO: A Core of Semantic Knowledge , 2022 .

[17]  Ali Farhadi,et al.  Learning Everything about Anything: Webly-Supervised Visual Concept Learning , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  Vladlen Koltun,et al.  Joint shape segmentation with linear programming , 2011, ACM Trans. Graph..

[19]  References , 1971 .

[20]  Haibin Ling,et al.  Shape Classification Using the Inner-Distance , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  D Marr,et al.  Early processing of visual information. , 1976, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[22]  Xinlei Chen,et al.  NEIL: Extracting Visual Knowledge from Web Data , 2013, 2013 IEEE International Conference on Computer Vision.

[23]  Marco Attene,et al.  Part-Based Annotation of Virtual 3D Shapes , 2007, 2007 International Conference on Cyberworlds (CW'07).

[24]  Pierre Alliez,et al.  Towards the Semantics of Digital Shapes: The AIM@SHAPE Approach , 2004, EWIMT.

[25]  Andrew Zisserman,et al.  An Exemplar Model for Learning Object Classes , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[26]  Andrew Blake,et al.  "GrabCut": interactive foreground extraction using iterated graph cuts , 2004, ACM Trans. Graph..

[27]  Rynson W. H. Lau,et al.  Data-driven segmentation and labeling of freehand sketches , 2014, ACM Trans. Graph..

[28]  Ali Shokoufandeh,et al.  Shock Graphs and Shape Matching , 1998, International Journal of Computer Vision.

[29]  Gerhard Weikum,et al.  YAGO2: exploring and querying world knowledge in time, space, context, and many languages , 2011, WWW.

[30]  Donald D. Hoffman,et al.  Parts of recognition , 1984, Cognition.

[31]  Hongsheng Li,et al.  A hierarchical image clustering cosegmentation framework , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[32]  Matthieu Guillaumin,et al.  Large-scale knowledge transfer for object localization in ImageNet , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[33]  Michael S. Bernstein,et al.  Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations , 2016, International Journal of Computer Vision.

[34]  David A. McAllester,et al.  Object Detection with Grammar Models , 2011, NIPS.

[35]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[36]  Chen Xu,et al.  The SUN Attribute Database: Beyond Categories for Deeper Scene Understanding , 2014, International Journal of Computer Vision.

[37]  I. Biederman Recognition-by-components: a theory of human image understanding. , 1987, Psychological review.

[38]  Markus Vincze,et al.  3DNet: Large-scale object class recognition from CAD models , 2012, 2012 IEEE International Conference on Robotics and Automation.

[39]  Shi-Min Hu,et al.  Structure recovery by part assembly , 2012, ACM Trans. Graph..

[40]  Junsong Yuan,et al.  Minimum near-convex decomposition for robust shape representation , 2011, 2011 International Conference on Computer Vision.

[41]  Jitendra Malik,et al.  Shape matching and object recognition using shape contexts , 2010, 2010 3rd International Conference on Computer Science and Information Technology.

[42]  ObjectPatchNet: Towards scalable and semantic image annotation and retrieval , 2014, Comput. Vis. Image Underst..

[43]  Alexei A. Efros,et al.  Beyond Categories: The Visual Memex Model for Reasoning About Object Relationships , 2009, NIPS.

[44]  David A. Shamma,et al.  YFCC100M , 2015, Commun. ACM.

[45]  Alan L. Yuille,et al.  Joint Object and Part Segmentation Using Deep Learned Potentials , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[46]  Dieter Fox,et al.  A large-scale hierarchical multi-view RGB-D object dataset , 2011, 2011 IEEE International Conference on Robotics and Automation.

[47]  Leonidas J. Guibas,et al.  ShapeNet: An Information-Rich 3D Model Repository , 2015, ArXiv.

[48]  James Hays,et al.  SUN attribute database: Discovering, annotating, and recognizing scene attributes , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[49]  Jiebo Luo,et al.  iCoseg: Interactive co-segmentation with intelligent scribble guidance , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[50]  Gerard de Melo,et al.  ShapeExplorer: Querying and Exploring Shapes using Visual Knowledge , 2016, EDBT.

[51]  S. Palmer Hierarchical structure in perceptual representation , 1977, Cognitive Psychology.

[52]  Hongyuan Wang,et al.  Shape clustering: Common structure discovery , 2013, Pattern Recognit..

[53]  Viktor K. Prasanna,et al.  Understanding web images by object relation network , 2012, WWW.