Open Research Online Knowledge Graph Construction with a façade: a unified method to access heterogeneous data sources on the Web

Data integration is the dominant use case for RDF Knowledge Graphs. However, Web resources come in formats with weak semantics (for example CSV and JSON), or formats speci � c to a given application (for example BibTex, HTML, and Markdown). To solve this problem, Knowledge Graph Construction (KGC) is gaining momentum due to its focus on supporting users in transforming data into RDF. However, using existing KGC frameworks result in complex data processing pipelines, which mix structural and semantic mappings, whose development and maintenance constitute a signi � cant bottleneck for KG engineers. Such frameworks force users to rely on di � erent tools, sometimes based on heterogeneous languages, for inspecting sources, designing mappings, and generating triples, thus making the process unnecessarily complicated. We argue that it is possible and desirable to equip KG engineers with the ability of interacting with Web data formats by relying on their expertise in RDF and the well-established SPARQL query language [2]. In this article, we study a uni � ed method for data access to heterogeneous data sources with Facade-X, a meta-model implemented in a new data integration system called SPARQL Anything. We demonstrate that our approach is theoretically sound, since it allows a single meta-model, based on RDF, to represent data from (a) any � le format expressible in BNF syntax, as well as (b) any relational database. We compare our method to state-of-the-art approaches in terms of usability (cognitive complexity of the mappings) and general performance. Finally, we discuss the bene � ts and challenges of this novel approach by engaging with the reference user community. the languages or syntaxes needed. 66.6% considered it very important or essential that the mappings should be easy to read and interpret. 70.3% considered it very important or essential that the system must be easy to learn for a Semantic Web practitioner. Participants were asked how important is it that the system is able to support new types of data sources without changes to the mapping language . 7.4% considered this essential, 48.1% as very important and 44.4% as moderately important. These results highlight the value-to-users of some founding assumptions of our system design. 25.9% considered it very important or essential to support complex manipulations within a single mapping � le. 40.7% considered it very important or essential to support mappings to multiple data sources within the same mapping � le. 40.7% considered it very important to enable data source exploration without committing to a

[1]  E. Daga,et al.  CLEF. A Linked Open Data native system for Crowdsourcing , 2022, Journal on Computing and Cultural Heritage.

[2]  Luigi Asprino,et al.  Integrating citizen experiences in cultural heritage archives: requirements, state of the art, and challenges , 2021 .

[3]  Enrico Motta,et al.  Sequential linked data: The state of affairs , 2021, Semantic Web.

[4]  François Scharffe,et al.  Knowledge Graph Benchmarking Report 2021 , 2021 .

[5]  Paul Mulholland,et al.  Facade-X: an opinionated approach to SPARQL anything , 2021, Studies on the Semantic Web.

[6]  Oscar Corcho,et al.  Enhancing virtual ontology based access over tabular data with Morph-CSV , 2021, Semantic Web.

[7]  Mary Williamson,et al.  Recipes for Building an Open-Domain Chatbot , 2020, EACL.

[8]  Sergey Levine,et al.  Recurrent Independent Mechanisms , 2019, ICLR.

[9]  Óscar Corcho,et al.  Knowledge Graph Construction: An ETL System-Based Overview , 2021 .

[10]  Juan Manuel Cueva Lovelle,et al.  ShExML: improving the usability of heterogeneous data mapping languages for first-time users , 2020, PeerJ Comput. Sci..

[11]  P. Mulholland,et al.  Enabling Multiple Voices in the Museum: Challenges and Approaches , 2020, Digital Culture & Society.

[12]  Tsvi Kuflik,et al.  Towards Advanced Interfaces for Citizen Curation , 2020, AVI²CH@AVI.

[13]  Maria-Esther Vidal,et al.  SDM-RDFizer: An RML Interpreter for the Efficient Creation of RDF Knowledge Graphs , 2020, CIKM.

[14]  Julien Mairal,et al.  Unsupervised Learning of Visual Features by Contrasting Cluster Assignments , 2020, NeurIPS.

[15]  Pierre H. Richemond,et al.  Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning , 2020, NeurIPS.

[16]  Mark Chen,et al.  Language Models are Few-Shot Learners , 2020, NeurIPS.

[17]  Nicolas Usunier,et al.  End-to-End Object Detection with Transformers , 2020, ECCV.

[18]  George Papadakis,et al.  OBDA for the Web: Creating Virtual RDF Graphs On Top of Web Data Sources , 2020, ArXiv.

[19]  Kaiming He,et al.  Improved Baselines with Momentum Contrastive Learning , 2020, ArXiv.

[20]  Enrico Motta,et al.  Towards a Framework for Visual Intelligence in Service Robotics: Epistemic Requirements and Gap Analysis , 2020, KR.

[21]  Geoffrey E. Hinton,et al.  A Simple Framework for Contrastive Learning of Visual Representations , 2020, ICML.

[22]  Quoc V. Le,et al.  Towards a Human-like Open-Domain Chatbot , 2020, ArXiv.

[23]  Laurens van der Maaten,et al.  Self-Supervised Learning of Pretext-Invariant Representations , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Guillaume Lample,et al.  Deep Learning for Symbolic Mathematics , 2019, ICLR.

[25]  Ross B. Girshick,et al.  Momentum Contrast for Unsupervised Visual Representation Learning , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Christopher Joseph Pal,et al.  A Meta-Transfer Objective for Learning to Disentangle Causal Mechanisms , 2019, ICLR.

[27]  Yoshua Bengio,et al.  CLOSURE: Assessing Systematic Generalization of CLEVR Models , 2019, ViGIL@NeurIPS.

[28]  Alan Geoffrey Hall,et al.  The 'lish': a data model for grid free spreadsheets , 2019 .

[29]  Enrico Motta,et al.  Modelling and Querying Lists in RDF. A Pragmatic Study , 2019, QuWeDa@ISWC.

[30]  Fabio Paternò,et al.  End-user development for personalizing applications, things, and robots , 2019, Int. J. Hum. Comput. Stud..

[31]  Nan Rosemary Ke,et al.  Learning Neural Causal Models from Unknown Interventions , 2019, ArXiv.

[32]  David Lopez-Paz,et al.  Invariant Risk Minimization , 2019, ArXiv.

[33]  Yee Whye Teh,et al.  Stacked Capsule Autoencoders , 2019, NeurIPS.

[34]  R Devon Hjelm,et al.  Learning Representations by Maximizing Mutual Information Across Views , 2019, NeurIPS.

[35]  Fabien L. Gandon,et al.  Enabling Automatic Discovery and Querying of Web APIs at Web Scale using Linked Data Standards , 2019, WWW.

[36]  Guillaume Lample,et al.  Cross-lingual Language Model Pretraining , 2019, NeurIPS.

[37]  Aaron C. Courville,et al.  Systematic Generalization: What Is Required and Can It Be Learned? , 2018, ICLR.

[38]  Diego Calvanese,et al.  Ontology-based data access - Beyond relational sources , 2019, Intelligenza Artificiale.

[39]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[40]  Paul Mulholland,et al.  Using SPARQL - The Practitioners' Viewpoint , 2018, EKAW.

[41]  Yoshua Bengio,et al.  BabyAI: First Steps Towards Grounded Language Learning With a Human In the Loop , 2018, ArXiv.

[42]  Diego Calvanese,et al.  Efficient Handling of SPARQL OPTIONAL for OBDA , 2018, SEMWEB.

[43]  Diego Calvanese,et al.  Ontology-Based Data Access: A Survey , 2018, IJCAI.

[44]  Ruben Verborgh,et al.  Declarative Rules for Linked Data Generation at Your Fingertips! , 2018, ESWC.

[45]  Jürgen Schmidhuber,et al.  Relational Neural Expectation Maximization: Unsupervised Discovery of Objects and their Interactions , 2018, ICLR.

[46]  Maurizio Lenzerini,et al.  Using Ontologies for Semantic Data Integration , 2018, A Comprehensive Guide Through the Italian Database Research.

[47]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[48]  S. Dehaene,et al.  What is consciousness, and could machines have it? , 2017, Science.

[49]  Adam Wierman,et al.  Thinking Fast and Slow , 2017, SIGMETRICS Perform. Evaluation Rev..

[50]  Yoshua Bengio The Consciousness Prior , 2017, ArXiv.

[51]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[52]  Antoine Zimmermann,et al.  A SPARQL Extension for Generating RDF from Heterogeneous Formats , 2017, ESWC.

[53]  Sergey Levine,et al.  Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[54]  Daniel P. Miranker,et al.  A Pay-As-You-Go Methodology for Ontology-Based Data Access , 2017, IEEE Internet Computing.

[55]  Diego Calvanese,et al.  Ontop: Answering SPARQL queries over relational databases , 2016, Semantic Web.

[56]  Bernhard Schölkopf,et al.  Discovering Causal Signals in Images , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[57]  Paul Mulholland,et al.  Characterizing the Landscape of Musical Data on the Web: state of the art and challenges , 2017, WHiSe@ISWC.

[58]  Diego Reforgiato Recupero,et al.  Framester: A Wide Coverage Linguistic Linked Data Hub , 2016, EKAW.

[59]  Geoffrey E. Hinton,et al.  Using Fast Weights to Attend to the Recent Past , 2016, NIPS.

[60]  Brad A. Myers,et al.  Using and Exploring Hierarchical Data in Spreadsheets , 2016, CHI.

[61]  Joshua B. Tenenbaum,et al.  Building machines that learn and think like people , 2016, Behavioral and Brain Sciences.

[62]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[63]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[64]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[65]  Mathieu d'Aquin,et al.  The Open University Linked Data - data.open.ac.uk , 2016, Semantic Web.

[66]  Johan Montagnat,et al.  Translation of Relational and Non-relational Databases into RDF with xR2RML , 2015, WEBIST.

[67]  Enrico Motta,et al.  Making sense of description logics , 2015, SEMANTiCS.

[68]  Mariano Rodriguez-Muro,et al.  Efficient SPARQL-to-SQL with R2RML mappings , 2015, J. Web Semant..

[69]  Xinlei Chen,et al.  Microsoft COCO Captions: Data Collection and Evaluation Server , 2015, ArXiv.

[70]  Jason Weston,et al.  End-To-End Memory Networks , 2015, NIPS.

[71]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[72]  Victor S. Lempitsky,et al.  Unsupervised Domain Adaptation by Backpropagation , 2014, ICML.

[73]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[74]  Ming Yang,et al.  Web-scale training for face identification , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[75]  Craig A. Knoblock,et al.  KR2RML: An Alternative Interpretation of R2RML for Heterogenous Sources , 2015, COLD.

[76]  Enrico Daga,et al.  A BASILar Approach for Building Web APIs on Top of SPARQL Endpoints , 2015, SALAD@ESWC.

[77]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[78]  Stefan Manegold,et al.  GeoTriples: a Tool for Publishing Geospatial Data as RDF Graphs Using R2RML Mappings , 2014, TC/SSN@ISWC.

[79]  Daniel P. Miranker,et al.  OBDA: Query Rewriting or Materialization? In Practice, Both! , 2014, SEMWEB.

[80]  Michael Zakharyaschev,et al.  Answering SPARQL Queries over Databases under OWL 2 QL Entailment Regime , 2014, SEMWEB.

[81]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[82]  Freddy Priyatna,et al.  Formalisation and experiences of R2RML-based SPARQL to SQL query translation using morph , 2014, WWW.

[83]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[84]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[85]  Rik Van de Walle,et al.  RML: A Generic Language for Integrated RDF Mappings of Heterogeneous Data , 2014, LDOW.

[86]  Oriol Nieto,et al.  JAMS: A JSON Annotated Music Specification for Reproducible MIR Research , 2014, ISMIR.

[87]  Óscar Corcho,et al.  Engineering optimisations in query rewriting for OBDA , 2013, I-SEMANTICS '13.

[88]  Alex Graves,et al.  Generating Sequences With Recurrent Neural Networks , 2013, ArXiv.

[89]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[90]  Diego Calvanese,et al.  Query Processing under GLAV Mappings for Relational and Graph Databases , 2012, Proc. VLDB Endow..

[91]  Tara N. Sainath,et al.  FUNDAMENTAL TECHNOLOGIES IN MODERN SPEECH RECOGNITION Digital Object Identifier 10.1109/MSP.2012.2205597 , 2012 .

[92]  Nitish Srivastava,et al.  Improving neural networks by preventing co-adaptation of feature detectors , 2012, ArXiv.

[93]  Antoine Isaac,et al.  data.europeana.eu: The Europeana Linked Open Data Pilot , 2011, Dublin Core Conference.

[94]  Andrea Giovanni Nuzzolese,et al.  Gathering lexical linked data and knowledge patterns from FrameNet , 2011, K-CAP '11.

[95]  Geoffrey E. Hinton,et al.  Transforming Auto-Encoders , 2011, ICANN.

[96]  Yoshua Bengio,et al.  Deep Sparse Rectifier Neural Networks , 2011, AISTATS.

[97]  Mary Shaw,et al.  The state of the art in end-user software engineering , 2011, ACM Comput. Surv..

[98]  Clément Farabet,et al.  Torch7: A Matlab-like Environment for Machine Learning , 2011, NIPS 2011.

[99]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[100]  Roel Wieringa,et al.  Design science methodology: principles and practice , 2010, 2010 ACM/IEEE 32nd International Conference on Software Engineering.

[101]  Aapo Hyvärinen,et al.  Noise-contrastive estimation: A new estimation principle for unnormalized statistical models , 2010, AISTATS.

[102]  Raymond R. Panko,et al.  Revising the Panko-Halverson taxonomy of spreadsheet errors , 2008, Decis. Support Syst..

[103]  Yann LeCun,et al.  What is the best multi-stage architecture for object recognition? , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[104]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[105]  Aldo Gangemi,et al.  Ontology Design Patterns , 2005 .

[106]  Geoffrey E. Hinton,et al.  Deep Belief Networks for phone recognition , 2009 .

[107]  Jason Weston,et al.  A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.

[108]  Diego Calvanese,et al.  Linking Data to Ontologies , 2008, J. Data Semant..

[109]  Diego Calvanese,et al.  Tractable Reasoning and Efficient Query Answering in Description Logics: The DL-Lite Family , 2007, Journal of Automated Reasoning.

[110]  Yoshua Bengio,et al.  Greedy Layer-Wise Training of Deep Networks , 2006, NIPS.

[111]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[112]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[113]  HENRY LIEBERMAN,et al.  End-User Development: An Emerging Paradigm , 2006, End User Development.

[114]  Yann LeCun,et al.  Learning a similarity metric discriminatively, with application to face verification , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[115]  Graeme S Halford,et al.  : The development of deductive reasoning: How important is complexity? , 2004 .

[116]  David M. Sobel,et al.  A theory of causal learning in children: causal maps and Bayes nets. , 2004, Psychological review.

[117]  Paul E. Utgoff,et al.  Many-Layered Learning , 2002, Neural Computation.

[118]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[119]  John B. Lowe,et al.  The Berkeley FrameNet Project , 1998, ACL.

[120]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[121]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[122]  Sebastian Thrun,et al.  Is Learning The n-th Thing Any Easier Than Learning The First? , 1995, NIPS.

[123]  Anthony J. Robinson,et al.  An application of recurrent nets to phone probability estimation , 1994, IEEE Trans. Neural Networks.

[124]  Yann LeCun,et al.  Signature Verification Using A "Siamese" Time Delay Neural Network , 1993, Int. J. Pattern Recognit. Artif. Intell..

[125]  Ralph E. Johnson,et al.  Design Patterns: Abstraction and Reuse of Object-Oriented Design , 1993, ECOOP.

[126]  Geoffrey E. Hinton,et al.  Self-organizing neural network that discovers surfaces in random-dot stereograms , 1992, Nature.

[127]  Yoshua Bengio,et al.  Learning a synaptic learning rule , 1991, IJCNN-91-Seattle International Joint Conference on Neural Networks.

[128]  Geoffrey E. Hinton Mapping Part-Whole Hierarchies into Connectionist Networks , 1990, Artif. Intell..

[129]  Eric Allman,et al.  RAP: a ring array processor for multilayer perceptron applications , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[130]  Lawrence D. Jackel,et al.  Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.

[131]  B. Baars A cognitive theory of consciousness , 1988 .

[132]  R. Shepard,et al.  Toward a universal law of generalization for psychological science. , 1987, Science.

[133]  Geoffrey E. Hinton Using fast weights to deblur old memories , 1987 .

[134]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[135]  D. C. Essen,et al.  Hierarchical organization and functional streams in the visual cortex , 1983, Trends in Neurosciences.

[136]  Geoffrey E. Hinton A Parallel Computation that Assigns Canonical Object-Based Frames of Reference , 1981, IJCAI.

[137]  Stephen N. Zilles,et al.  Programming with abstract data types , 1974, SIGPLAN Symposium on Very High Level Languages.