Extracting entity-specific substructures for RDF graph embeddings

Knowledge Graphs (KGs) have become useful sources of structured data for information retrieval and data analytics tasks. Enabling complex analytics, however, requires entities in KGs to be represented in a way that is suitable for Machine Learning tasks. Several approaches have been recently proposed for obtaining vector representations of KGs based on identifying and extracting relevant graph substructures using both uniform and biased random walks. However, such approaches lead to representations comprising mostly popular, instead of relevant, entities in the KG. In KGs, in which different types of entities often exist (such as in Linked Open Data), a given target entity may have its own distinct set of most relevant nodes and edges. We propose specificity as an accurate measure of identifying most relevant, entity-specific, nodes and edges. We develop a scalable method based on bidirectional random walks to compute specificity. Our experimental evaluation results show that specificitybased biased random walks extract more meaningful (in terms of size and relevance) substructures compared to the state-of-the-art and the graph embedding learned from the extracted substructures perform well against existing methods in common data mining tasks.

[1]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[2]  José Paulo Leal,et al.  Computing Semantic Relatedness using DBPedia , 2012, SLATE.

[3]  Jens Lehmann,et al.  Wikidata through the Eyes of DBpedia , 2015, Semantic Web.

[4]  Liyang Yu Linked Open Data , 2011 .

[5]  G Stix,et al.  The mice that warred. , 2001, Scientific American.

[6]  Dennis Diefenbach,et al.  PageRank and Generic Entity Summarization for RDF Knowledge Bases , 2018, ESWC.

[7]  Evgeniy Gabrilovich,et al.  A Review of Relational Machine Learning for Knowledge Graphs , 2015, Proceedings of the IEEE.

[8]  Steven Skiena,et al.  DeepWalk: online learning of social representations , 2014, KDD.

[9]  Jens Lehmann,et al.  DBpedia - A large-scale, multilingual knowledge base extracted from Wikipedia , 2015, Semantic Web.

[10]  Markus Zanker,et al.  Linked open data to support content-based recommender systems , 2012, I-SEMANTICS '12.

[11]  Viktor K. Prasanna,et al.  Automatic Integration and Querying of Semantic Rich Heterogeneous Data: Laying the Foundations for Semantic Web of Things , 2017, Managing the Web of Things.

[12]  Pinar Yanardag,et al.  Deep Graph Kernels , 2015, KDD.

[13]  Heiko Paulheim,et al.  Global RDF Vector Space Embeddings , 2017, SEMWEB.

[14]  P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[15]  Tim Weninger,et al.  ProjE: Embedding Projection for Knowledge Graph Completion , 2016, AAAI.

[16]  Yannis Tzitzikas,et al.  Demonstrating Blank Node Matching and RDF/S Comparison Functions , 2012, International Semantic Web Conference.

[17]  Frank van Harmelen,et al.  A reasonable Semantic Web , 2010, Semantic Web.

[18]  Viktor K. Prasanna,et al.  ASQFor: Automatic SPARQL query formulation for the non-expert , 2017, AI Commun..

[19]  Craig A. Knoblock,et al.  Efficient Graph-Based Document Similarity , 2016, ESWC.

[20]  Craig A. Knoblock,et al.  Unsupervised Entity Resolution on Multi-type Graphs , 2016, SEMWEB.

[21]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[22]  Achim Rettinger,et al.  PageRank on Wikipedia: Towards General Importance Scores for Entities , 2016, @ESWC.

[23]  Tim Berners-Lee,et al.  Linked Data - The Story So Far , 2009, Int. J. Semantic Web Inf. Syst..

[24]  Stephan Bloehdorn,et al.  Graph Kernels for RDF Data , 2012, ESWC.

[25]  Steven de Rooij,et al.  Substructure counting graph kernels for machine learning from RDF data , 2015, J. Web Semant..

[26]  Heiko Paulheim,et al.  RDF2Vec: RDF Graph Embeddings for Data Mining , 2016, SEMWEB.

[27]  Jure Leskovec,et al.  node2vec: Scalable Feature Learning for Networks , 2016, KDD.

[28]  Heiko Paulheim,et al.  Biased graph walks for RDF graph embeddings , 2017, WIMS.

[29]  Palash Goyal,et al.  Graph Embedding Techniques, Applications, and Performance: A Survey , 2017, Knowl. Based Syst..

[30]  John G. Breslin,et al.  Transfer Learning for Item Recommendations and Knowledge Graph Completion in Item Related Domains via a Co-Factorization Model , 2018, ESWC.

[31]  Viktor K. Prasanna,et al.  Smart Oilfield Safety Net - An Intelligent System for Integrated Asset Integrity Management , 2018 .

[32]  Pasquale Lops,et al.  Content-based Recommender Systems: State of the Art and Trends , 2011, Recommender Systems Handbook.

[33]  James A. Hendler,et al.  The Semantic Web" in Scientific American , 2001 .

[34]  Dominic Widdows,et al.  Using LSA and Noun Coordination Information to Improve the Recall and Precision of Automatic Hyponymy Extraction , 2003, CoNLL.

[35]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[36]  Hai Jin,et al.  Practical and effective IR-style keyword search over semantic web , 2009, Inf. Process. Manag..

[37]  Philip S. Yu,et al.  PathSim , 2011, Proc. VLDB Endow..

[38]  Filippo Menczer,et al.  Finding Streams in Knowledge Graphs to Support Fact Checking , 2017, 2017 IEEE International Conference on Data Mining (ICDM).

[39]  Viktor K. Prasanna,et al.  Extracting Entity-Specific Substructures for RDF Graph Embedding , 2018, 2018 IEEE International Conference on Information Reuse and Integration (IRI).

[40]  Evgeniy Gabrilovich,et al.  Computing Semantic Relatedness Using Wikipedia-based Explicit Semantic Analysis , 2007, IJCAI.

[41]  Kurt Mehlhorn,et al.  Efficient graphlet kernels for large graph comparison , 2009, AISTATS.

[42]  Stefano Faralli,et al.  Large-scale taxonomy induction using entity and word embeddings , 2017, WI.

[43]  Andrea Dessi,et al.  Ranking DBpedia Properties , 2014, 2014 IEEE 23rd International WETICE Conference.

[44]  Paul Buitelaar,et al.  Who are the American Vegans related to Brad Pitt?: Exploring Related Entities , 2015, WWW.

[45]  Paolo Tomeo,et al.  SPrank: Semantic Path-Based Ranking for Top-N Recommendations Using Linked Open Data , 2016, ACM Trans. Intell. Syst. Technol..

[46]  Viktor K. Prasanna,et al.  Semantic Web Technologies for External Corrosion Detection in Smart Oil Fields , 2015 .

[47]  Achim Rettinger,et al.  Mining the Semantic Web , 2012, Data Mining and Knowledge Discovery.

[48]  Hans-Peter Kriegel,et al.  A scalable approach for statistical learning in semantic graphs , 2014, Semantic Web.

[49]  Achim Rettinger,et al.  Linked data quality of DBpedia, Freebase, OpenCyc, Wikidata, and YAGO , 2017, Semantic Web.

[50]  Max J. Egenhofer,et al.  Determining Semantic Similarity among Entity Classes from Different Ontologies , 2003, IEEE Trans. Knowl. Data Eng..

[51]  Andreas Schmidt,et al.  Data mining and linked open data – New perspectives for data analysis in environmental research , 2015 .