论文信息 - Scene Graphs: A Survey of Generations and Applications

Scene Graphs: A Survey of Generations and Applications

Scene graph is a structured representation of a scene that can clearly express the objects, attributes, and relationships between objects in the scene. As computer vision technology continues to develop, people are no longer satisfied with simply detecting and recognizing objects in images; instead, people look forward to a higher level of understanding and reasoning about visual scenes. For example, given an image, we want to not only detect and recognize objects in the image, but also understand the relationship between objects (visual relationship detection), and generate a text description (image captioning) based on the image content. Alternatively, we might want the machine to tell us what the little girl in the image is doing (Visual Question Answering (VQA)), or even remove the dog from the image and find similar images (image editing and retrieval), etc. These tasks require a higher level of understanding and reasoning for image vision tasks. The scene graph is just such a powerful tool for scene understanding. Therefore, scene graphs have attracted the attention of a large number of researchers, and related research is often cross-modal, complex, and rapidly developing. However, no relatively systematic survey of scene graphs exists at present. To this end, this survey conducts a comprehensive investigation of the current scene graph research. More specifically, we first summarize the general definition of the scene graph, then conducte a comprehensive and systematic discussion on the generation method of the scene graph (SGG) and the SGG with the aid of prior knowledge. We then investigate the main applications of scene graphs and summarize the most commonly used datasets. Finally, we provide some insights into the future development of scene graphs.

[1] Vladlen Koltun,et al. Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials , 2011, NIPS.

[2] Xilin Chen,et al. Cross-modal Scene Graph Matching for Relationship-aware Image-Text Retrieval , 2019, 2020 IEEE Winter Conference on Applications of Computer Vision (WACV).

[3] Trevor Darrell,et al. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[4] Sam Witteveen,et al. Scene Graph Parsing by Attention Graph , 2019, ArXiv.

[5] Volker Tresp,et al. Classification by Attention: Scene Graph Classification with Prior Knowledge , 2020, AAAI.

[6] Junbo Zhang,et al. Flow Prediction in Spatio-Temporal Networks Based on Multitask Deep Learning , 2020, IEEE Transactions on Knowledge and Data Engineering.

[7] Mohan S. Kankanhalli,et al. Learning to Detect Human-Object Interactions With Knowledge , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8] Ah Chung Tsoi,et al. The Graph Neural Network Model , 2009, IEEE Transactions on Neural Networks.

[9] Luca Carlone,et al. Kimera: From SLAM to spatial perception with 3D dynamic scene graphs , 2021, Int. J. Robotics Res..

[10] Alexei Bastidas,et al. Using Scene Graph Context to Improve Image Generation , 2019, ArXiv.

[11] Catherine Havasi,et al. ConceptNet 5.5: An Open Multilingual Graph of General Knowledge , 2016, AAAI.

[12] Jie Xu,et al. Social-Aware Sequential Modeling of User Interests: A Deep Learning Approach , 2019, IEEE Transactions on Knowledge and Data Engineering.

[13] Ryan A. Rossi,et al. Deep Inductive Graph Representation Learning , 2020, IEEE Transactions on Knowledge and Data Engineering.

[14] Wei Liu,et al. Learning to Compose Dynamic Tree Structures for Visual Contexts , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[15] Philip H. S. Torr,et al. Learn To Pay Attention , 2018, ICLR.

[16] Matthieu Cord,et al. BLOCK: Bilinear Superdiagonal Fusion for Visual Question Answering and Visual Relationship Detection , 2019, AAAI.

[17] Li Fei-Fei,et al. Image Generation from Scene Graphs , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[18] Svetlana Lazebnik,et al. Phrase Localization and Visual Relationship Detection with Comprehensive Image-Language Cues , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[19] Gang Wang,et al. Unpaired Image Captioning via Scene Graph Alignments , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[20] Gregory D. Hager,et al. Semantic Image Manipulation Using Scene Graphs , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[21] Chiranjib Sur,et al. TPsgtR: Neural-Symbolic Tensor Product Scene-Graph-Triplet Representation for Image Captioning , 2019, ArXiv.

[22] Long Chen,et al. Counterfactual Critic Multi-Agent Training for Scene Graph Generation , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[23] Shih-Fu Chang,et al. Bridging Knowledge Graphs to Generate Scene Graphs , 2020, ECCV.

[24] Jiajie Xu,et al. Predicting Destinations by a Deep Learning based Approach , 2021, IEEE Transactions on Knowledge and Data Engineering.

[25] Ming Zhou,et al. Hierarchical Recurrent Neural Network for Document Modeling , 2015, EMNLP.

[26] Ivan Laptev,et al. Weakly-Supervised Learning of Visual Relations , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[27] Nenghai Yu,et al. Zoom-Net: Mining Deep Feature Interactions for Visual Relationship Recognition , 2018, ECCV.

[28] Ji Zhang,et al. An Interpretable Model for Scene Graph Generation , 2018, ArXiv.

[29] Jingkuan Song,et al. Learning from the Scene and Borrowing from the Rich: Tackling the Long Tail in Scene Graph Generation , 2020, IJCAI.

[30] Larry S. Davis,et al. Visual Relationship Detection with Internal and External Linguistic Knowledge Distillation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[31] Vibhav Vineet,et al. Conditional Random Fields as Recurrent Neural Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[32] Xiaojun Chang,et al. Reliable Shot Identification for Complex Event Detection via Visual-Semantic Embedding , 2021, Comput. Vis. Image Underst..

[33] Mitesh M. Khapra,et al. Scene Graph based Image Retrieval - A case study on the CLEVR Dataset , 2019, ArXiv.

[34] Jiaxuan Wang,et al. HICO: A Benchmark for Recognizing Human-Object Interactions in Images , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[35] Guillaume Lample,et al. Neural Architectures for Named Entity Recognition , 2016, NAACL.

[36] Shalini Ghosh,et al. Generating Natural Language Explanations for Visual Question Answering using Scene Graphs and Visual Attention , 2019, ArXiv.

[37] Chenhui Chu,et al. Understanding the Role of Scene Graphs in Visual Question Answering , 2021, ArXiv.

[38] Jure Leskovec,et al. Inductive Representation Learning on Large Graphs , 2017, NIPS.

[39] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[40] Masood S. Mortazavi,et al. Fully Convolutional Scene Graph Generation , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[41] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[42] Eric P. Xing,et al. Harnessing Deep Neural Networks with Logic Rules , 2016, ACL.

[43] Christoph H. Lampert,et al. Detecting Visual Relationships Using Box Attention , 2018, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[44] Cordelia Schmid,et al. Detecting Unseen Visual Relations Using Analogies , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[45] Jun Zhao,et al. Knowledge Graph Embedding via Dynamic Mapping Matrix , 2015, ACL.

[46] Li Fei-Fei,et al. Scaling Human-Object Interaction Recognition Through Zero-Shot Learning , 2018, 2018 IEEE Winter Conference on Applications of Computer Vision (WACV).

[47] Zhou Su,et al. Learning Visual Knowledge Memory Networks for Visual Question Answering , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[48] Le Song,et al. Discriminative Embeddings of Latent Variable Models for Structured Data , 2016, ICML.

[49] Jonathan Berant,et al. Learning to generalize to new compositions in image understanding , 2016, ArXiv.

[50] Bodo Rosenhahn,et al. On Support Relations and Semantic Scene Graphs , 2016, ArXiv.

[51] Xilin Chen,et al. Sketching Image Gist: Human-Mimetic Hierarchical Scene Graph Generation , 2020, ECCV.

[52] Stefan Lee,et al. Graph R-CNN for Scene Graph Generation , 2018, ECCV.

[53] Shuicheng Yan,et al. Semantic Object Parsing with Graph LSTM , 2016, ECCV.

[54] Christoph Goller,et al. Learning task-dependent distributed representations by backpropagation through structure , 1996, Proceedings of International Conference on Neural Networks (ICNN'96).

[55] Michael S. Bernstein,et al. Visual Relationships as Functions:Enabling Few-Shot Scene Graph Prediction , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[56] Mathias Niepert,et al. Learning Convolutional Neural Networks for Graphs , 2016, ICML.

[57] Cewu Lu,et al. Transferable Interactiveness Prior for Human-Object Interaction Detection , 2018, ArXiv.

[58] Svetlana Lazebnik,et al. Union Visual Translation Embedding for Visual Relationship Detection and Scene Graph Generation , 2019, ArXiv.

[59] Vikas Singh,et al. Tensorize, Factorize and Regularize: Robust Visual Relationship Learning , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[60] Abhinav Gupta,et al. Videos as Space-Time Region Graphs , 2018, ECCV.

[61] Jiashi Feng,et al. Visual Relationship Detection With Visual-Linguistic Knowledge From Multimodal Representations , 2021, IEEE Access.

[62] Eren Erdal Aksoy,et al. Categorizing object-action relations from semantic scene graphs , 2010, 2010 IEEE International Conference on Robotics and Automation.

[63] Alex Graves,et al. Recurrent Models of Visual Attention , 2014, NIPS.

[64] Hideki Nakayama,et al. Recurrent Visual Relationship Recognition with Triplet Unit , 2017, 2017 IEEE International Symposium on Multimedia (ISM).

[65] Oliver Schulte,et al. Deep Generative Probabilistic Graph Neural Networks for Scene Graph Generation , 2020, AAAI.

[66] Ke Zhang,et al. A hierarchical recurrent approach to predict scene graphs from a visual‐attention‐oriented perspective , 2019, Comput. Intell..

[67] Jonathan Berant,et al. Differentiable Scene Graphs , 2019, 2020 IEEE Winter Conference on Applications of Computer Vision (WACV).

[68] Mohan S. Kankanhalli,et al. Visual Social Relationship Recognition , 2018, International Journal of Computer Vision.

[69] Qi Wu,et al. HCVRD: A Benchmark for Large-Scale Human-Centered Visual Relationship Detection , 2018, AAAI.

[70] Vineeth N Balasubramanian,et al. Assisting Scene Graph Generation with Self-Supervision , 2020, ArXiv.

[71] Jorma Laaksonen,et al. Tackling the Unannotated: Scene Graph Generation with Bias-Reduced Models , 2020, BMVC.

[72] Bo Zhao,et al. Image Generation From Layout , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[73] Jianfeng Gao,et al. Learning Visual Relation Priors for Image-Text Matching and Image Captioning with Neural Scene Graph Generators , 2019, ArXiv.

[74] Aaron C. Courville,et al. Graph Density-Aware Losses for Novel Compositions in Scene Graph Generation , 2020, BMVC.

[75] Tanaya Guha,et al. SG2Caps: Revisiting Scene Graphs for Image Captioning , 2021, ArXiv.

[76] Ian D. Reid,et al. Towards Context-Aware Interaction Recognition for Visual Relationship Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[77] Xavier Bresson,et al. Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering , 2016, NIPS.

[78] Xuelong Li,et al. Weakly Supervised Multimodal Kernel for Categorizing Aerial Photographs , 2017, IEEE Transactions on Image Processing.

[79] Svetlana Lazebnik,et al. Contextual Translation Embedding for Visual Relationship Detection and Scene Graph Generation , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[80] Andrew McCallum,et al. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[81] Juan Carlos Niebles,et al. Action Genome: Actions As Compositions of Spatio-Temporal Scene Graphs , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[82] Kaiming He,et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[83] Wang-Chien Lee,et al. Scene Graph Generation via Conditional Random Fields , 2018, ArXiv.

[84] Ji Zhang,et al. Relationship Proposal Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[85] David A. Shamma,et al. The New Data and New Challenges in Multimedia Research , 2015, ArXiv.

[86] Jia Deng,et al. Pixels to Graphs by Associative Embedding , 2017, NIPS.

[87] Adriana Kovashka,et al. Linguistic Structures as Weak Supervision for Visual Scene Graph Generation , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[88] Yongkang Wong,et al. Explainable Video Action Reasoning via Prior Knowledge and State Transitions , 2019, ACM Multimedia.

[89] Trevor Darrell,et al. Learning Canonical Representations for Scene Graph to Image Generation , 2019, ECCV.

[90] Max Welling,et al. Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[91] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.

[92] Yu-Chiang Frank Wang,et al. Multi-label Zero-Shot Learning with Structured Knowledge Graphs , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[93] Ali Farhadi,et al. Recognition using visual phrases , 2011, CVPR 2011.

[94] Shiguang Shan,et al. Exploring Context and Visual Pattern of Relationship for Scene Graph Generation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[95] Zhihui Li,et al. A Survey of Deep Active Learning , 2020, ACM Comput. Surv..

[96] Bo Wang,et al. Image Captioning with Scene-graph Based Semantic Concepts , 2018, ICMLC.

[97] Guosheng Lin,et al. Graph Edit Distance Reward: Learning to Edit Scene Graph , 2020, ECCV.

[98] Zaixing He,et al. Learning to transfer focus of graph neural network for scene graph parsing , 2020, Pattern Recognit..

[99] Bivas Mitra,et al. Deep Learning Driven Venue Recommender for Event-Based Social Networks , 2020, IEEE Transactions on Knowledge and Data Engineering.

[100] In-So Kweon,et al. LinkNet: Relational Embedding for Scene Graph , 2018, NeurIPS.

[101] Tat-Seng Chua,et al. Video Visual Relation Detection , 2017, ACM Multimedia.

[102] Abhinav Gupta,et al. Zero-Shot Recognition via Semantic Embeddings and Knowledge Graphs , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[103] Xiaogang Wang,et al. Scene Graph Generation from Objects, Phrases and Region Captions , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[104] Xiaojun Chang,et al. ZeroNAS: Differentiable Generative Adversarial Networks Search for Zero-Shot Learning , 2021, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[105] Chunxia Xiao,et al. Narrative Collage of Image Collections by Scene Graph Recombination , 2018, IEEE Transactions on Visualization and Computer Graphics.

[106] Ross B. Girshick,et al. Seeing through the Human Reporting Bias: Visual Classifiers from Noisy Human-Centric Labels , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[107] Xiaojun Chang,et al. Adaptive Semi-Supervised Feature Selection for Cross-Modal Retrieval , 2019, IEEE Transactions on Multimedia.

[108] Danfei Xu,et al. Scene Graph Generation by Iterative Message Passing , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[109] Xiaogang Wang,et al. Object Detection in Videos with Tubelet Proposal Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[110] Jianfei Cai,et al. Auto-Encoding Scene Graphs for Image Captioning , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[111] Jian Yang,et al. Context-Dependent Diffusion Network for Visual Relationship Detection , 2018, ACM Multimedia.

[112] Bodo Rosenhahn,et al. Natural Language Guided Visual Relationship Detection , 2017, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[113] Stéphane Mallat,et al. Invariant Scattering Convolution Networks , 2012, IEEE transactions on pattern analysis and machine intelligence.

[114] Xiaogang Wang,et al. Object Detection from Video Tubelets with Convolutional Neural Networks , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[115] Jong-Hwan Kim,et al. 3-D Scene Graph: A Sparse and Semantic Representation of Physical Environments for Intelligent Agents , 2019, IEEE Transactions on Cybernetics.

[116] Ji Zhang,et al. Graphical Contrastive Losses for Scene Graph Generation , 2019, ArXiv.

[117] Gaurav Mittal,et al. Interactive Image Generation Using Scene Graphs , 2019, DGS@ICLR.

[118] Eric P. Xing,et al. Deep Variation-Structured Reinforcement Learning for Visual Relationship and Attribute Detection , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[119] Michael S. Bernstein,et al. Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations , 2016, International Journal of Computer Vision.

[120] Petros Maragos,et al. From Saturation to Zero-Shot Visual Relationship Detection Using Local Context , 2020, BMVC.

[121] Baoyuan Wu,et al. Probabilistic Modeling of Semantic Ambiguity for Scene Graph Generation , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[122] Liang Lin,et al. Knowledge-Embedded Routing Network for Scene Graph Generation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[123] Ning Xu,et al. Scene graph captioner: Image captioning based on structural visual representation , 2019, J. Vis. Commun. Image Represent..

[124] Zhihui Li,et al. A Comprehensive Survey of Neural Architecture Search: Challenges and Solutions , 2020, ArXiv.

[125] Yang Zhang,et al. PANet: A Context Based Predicate Association Network for Scene Graph Generation , 2019, 2019 IEEE International Conference on Multimedia and Expo (ICME).

[126] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.

[127] Shih-Fu Chang,et al. Learning Visual Commonsense for Robust Scene Graph Generation: Supplementary Material , 2020 .

[128] Jason Weston,et al. Translating Embeddings for Modeling Multi-relational Data , 2013, NIPS.

[129] Peng Wang,et al. Say As You Wish: Fine-Grained Control of Image Caption Generation With Abstract Scene Graphs , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[130] Hanlin Tang,et al. Compact Scene Graphs for Layout Composition and Patch Retrieval , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[131] Feiping Nie,et al. Person Reidentification via Multi-Feature Fusion With Adaptive Graph Learning , 2020, IEEE Transactions on Neural Networks and Learning Systems.

[132] Evangelos Kalogerakis,et al. SceneGraphNet: Neural Message Passing for 3D Indoor Scene Augmentation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[133] Qinghua Zheng,et al. Simple to Complex Cross-modal Learning to Rank , 2017, Comput. Vis. Image Underst..

[134] Jia Deng,et al. Learning to Detect Human-Object Interactions , 2017, 2018 IEEE Winter Conference on Applications of Computer Vision (WACV).

[135] Kaiming He,et al. Detecting and Recognizing Human-Object Interactions , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[136] Silvio Savarese,et al. 3D Scene Graph: A Structure for Unified Semantics, 3D Space, and Camera , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[137] Chitta Baral,et al. Image Understanding using vision and reasoning through Scene Description Graph , 2018, Comput. Vis. Image Underst..

[138] Xiaogang Wang,et al. ViP-CNN: Visual Phrase Guided Convolutional Neural Network , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[139] Daniel Cohen-Or,et al. GRAINS , 2018, ACM Trans. Graph..

[140] Ross B. Girshick,et al. Mask R-CNN , 2017, 1703.06870.

[141] Catherine Havasi,et al. ConceptNet 5: A Large Semantic Network for Relational Knowledge , 2013, The People's Web Meets NLP.

[142] Andreas Kunz,et al. BGT-Net: Bidirectional GRU Transformer Network for Scene Graph Generation , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[143] Richard S. Zemel,et al. Gated Graph Sequence Neural Networks , 2015, ICLR.

[144] Trevor Darrell,et al. Conditional Random Fields for Object Recognition , 2004, NIPS.

[145] Yueting Zhuang,et al. Multiple knowledge representation for big data artificial intelligence: framework, applications, and case studies , 2021, Frontiers of Information Technology & Electronic Engineering.

[146] Petros Maragos,et al. Deeply Supervised Multimodal Attentional Translation Embeddings for Visual Relationship Detection , 2019, 2019 IEEE International Conference on Image Processing (ICIP).

[147] Song-Chun Zhu,et al. Learning Human-Object Interactions by Graph Parsing Neural Networks , 2018, ECCV.

[148] Chitta Baral,et al. From Images to Sentences through Scene Description Graphs using Commonsense Reasoning and Knowledge , 2015, ArXiv.

[149] Federico Tombari,et al. Learning 3D Semantic Scene Graphs From 3D Indoor Reconstructions , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[150] Jinquan Zeng,et al. GPS-Net: Graph Property Sensing Network for Scene Graph Generation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[151] Shuicheng Yan,et al. Scene Graph Generation With Hierarchical Context , 2020, IEEE Transactions on Neural Networks and Learning Systems.

[152] Meng Zhang,et al. Multi-Granularity Reasoning for Social Relation Recognition From Images , 2019, 2019 IEEE International Conference on Multimedia and Expo (ICME).

[153] Bo Dai,et al. Detecting Visual Relationships with Deep Relational Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[154] Jianfei Cai,et al. Scene Graph Generation With External Knowledge and Image Reconstruction , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[155] Cheng Zhang,et al. An Empirical Study on Leveraging Scene Graphs for Visual Question Answering , 2019, BMVC.

[156] Shubham Atreja,et al. Adversarial Adaptation of Scene Graph Models for Understanding Civic Issues , 2019, WWW.

[157] Yejin Choi,et al. Neural Motifs: Scene Graph Parsing with Global Context , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[158] Fu Jie Huang,et al. A Tutorial on Energy-Based Learning , 2006 .

[159] Wei Liu,et al. SSD: Single Shot MultiBox Detector , 2015, ECCV.

[160] Bingbing Ni,et al. Image Re-Attentionizing , 2013, IEEE Transactions on Multimedia.

[161] Juan-Zi Li,et al. Explainable and Explicit Visual Reasoning Over Scene Graphs , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[162] Hanwang Zhang,et al. Visual Commonsense R-CNN , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[163] Samy Bengio,et al. Learning semantic relationships for better action retrieval in images , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[164] Michael S. Bernstein,et al. Referring Relationships , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[165] Tao Yuan,et al. Scene-Centric Joint Parsing of Cross-View Videos , 2017, AAAI.

[166] Weijian Li,et al. Attentive Relational Networks for Mapping Images to Scene Graphs , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[167] Shih-Fu Chang,et al. Visual Translation Embedding Network for Visual Relation Detection , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[168] Jonathan Berant,et al. Mapping Images to Scene Graphs with Permutation-Invariant Structured Prediction , 2018, NeurIPS.

[169] Yoshua Bengio,et al. On the Properties of Neural Machine Translation: Encoder–Decoder Approaches , 2014, SSST@EMNLP.

[170] Hanlin Tang,et al. Triplet-Aware Scene Graph Embeddings , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[171] Chun Yuan,et al. HOSE-Net: Higher Order Structure Embedded Network for Scene Graph Generation , 2020, ACM Multimedia.

[172] Shuqiang Jiang,et al. Know More Say Less: Image Captioning Based on Scene Graphs , 2019, IEEE Transactions on Multimedia.

[173] Tao Mei,et al. VrR-VG: Refocusing Visually-Relevant Relationships , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[174] Ji Zhang,et al. Large-Scale Visual Relationship Understanding , 2018, AAAI.

[175] Xiaogang Wang,et al. Factorizable Net: An Efficient Subgraph-based Framework for Scene Graph Generation , 2018, ECCV.

[176] Sergey Ioffe,et al. Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[177] Jianqiang Huang,et al. Unbiased Scene Graph Generation From Biased Training , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[178] Hai Wan,et al. Representation Learning for Scene Graph Completion via Jointly Structural and Visual Embedding , 2018, IJCAI.

[179] Christopher D. Manning,et al. Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks , 2015, ACL.

[180] Xinlei Chen,et al. Iterative Visual Reasoning Beyond Convolutions , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[181] Piji Li,et al. Storytelling from an Image Stream Using Scene Graphs , 2020, AAAI.

[182] Xiaogang Wang,et al. T-CNN: Tubelets With Convolutional Neural Networks for Object Detection From Videos , 2016, IEEE Transactions on Circuits and Systems for Video Technology.

[183] Samuel S. Schoenholz,et al. Neural Message Passing for Quantum Chemistry , 2017, ICML.

[184] Wei Li,et al. Early results for Named Entity Recognition with Conditional Random Fields, Feature Induction and Web-Enhanced Lexicons , 2003, CoNLL.

[185] Lingling Zhang,et al. Deep Top-$k$ Ranking for Image–Sentence Matching , 2020, IEEE Transactions on Multimedia.

[186] Mohan S. Kankanhalli,et al. Toward Region-Aware Attention Learning for Scene Graph Generation , 2021, IEEE Transactions on Neural Networks and Learning Systems.

[187] Razvan Pascanu,et al. A simple neural network module for relational reasoning , 2017, NIPS.

[188] Ali Farhadi,et al. You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[189] F. Scarselli,et al. A new model for learning in graph domains , 2005, Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005..

[190] Xiaogang Wang,et al. PasteGAN: A Semi-Parametric Method to Generate Image from Scene Graph , 2019, NeurIPS.

[191] Zhiyuan Liu,et al. Learning Entity and Relation Embeddings for Knowledge Graph Completion , 2015, AAAI.

[192] Daniel Jurafsky,et al. A Hierarchical Neural Autoencoder for Paragraphs and Documents , 2015, ACL.

[193] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.

[194] Michael S. Bernstein,et al. Image retrieval using scene graphs , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[195] Zhuoqian Yang,et al. Scene Graph Reasoning with Prior Visual Relationship for Visual Question Answering , 2018 .

[196] Li Fei-Fei,et al. Generating Semantically Precise Scene Graphs from Textual Descriptions for Improved Image Retrieval , 2015, VL@EMNLP.

[197] Subarna Tripathi,et al. Structured Query-Based Image Retrieval Using Scene Graphs , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[198] Dong Yu,et al. Comprehensive Image Captioning via Scene Graph Decomposition , 2020, ECCV.

[199] Baoyuan Wu,et al. Dual ResGCN for Balanced Scene GraphGeneration , 2020, ArXiv.

[200] Trevor Darrell,et al. Classifying Collisions with Spatio-Temporal Action Graph Networks , 2018, ArXiv.

[201] L. Sigal,et al. Energy-Based Learning for Scene Graph Generation , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[202] Michael S. Bernstein,et al. Visual Relationship Detection with Language Priors , 2016, ECCV.

[203] Jeffrey Dean,et al. Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[204] Heng Tao Shen,et al. One-shot Scene Graph Generation , 2020, ACM Multimedia.

[205] Yunhong Wang,et al. Online Cross-Modal Scene Retrieval by Binary Representation and Semantic Graph , 2017, ACM Multimedia.

[206] Jianfei Cai,et al. Shuffle-Then-Assemble: Learning Object-Agnostic Visual Relationship Features , 2018, ECCV.

[207] Xian-Sheng Hua,et al. PCPL: Predicate-Correlation Perception Learning for Unbiased Scene Graph Generation , 2020, ACM Multimedia.