A knowledge-based platform for Big Data analytics based on publish/subscribe services and stream processing

Knowledge-based solution for automatic schema mapping to manage data heterogeneity.Automatic ontology extraction and semantic inference for novel Big Data analytics.Integration with publish/subscribe services for large-scale analytics infrastructures. Big Data analytics is considered an imperative aspect to be further improved in order to increase the operating margin of both public and private enterprises, and represents the next frontier for their innovation, competition, and productivity. Big Data are typically produced in different sectors of the above organizations, often geographically distributed throughout the world, and are characterized by a large size and variety. Therefore, there is a strong need for platforms handling larger and larger amounts of data in contexts characterized by complex event processing systems and multiple heterogeneous sources, dealing with the various issues related to efficiently disseminating, collecting and analyzing them in a fully distributed way.In such a scenario, this work proposes a way to overcome two fundamental issues: data heterogeneity and advanced processing capabilities. We present a knowledge-based solution for Big Data analytics, which consists in applying automatic schema mapping to face with data heterogeneity, as well as ontology extraction and semantic inference to support innovative processing. Such a solution, based on the publish/subscribe paradigm, has been evaluated within the context of a simple experimental proof-of-concept in order to determine its performance and effectiveness.

[1]  Satya S. Sahoo,et al.  A Survey of Current Approaches for Mapping of Relational Databases to RDF , 2009 .

[2]  Luciano Serafini,et al.  Semantic Coordination: A New Approach and an Application , 2003, SEMWEB.

[3]  Gregor Hohpe,et al.  Enterprise Integration Patterns: Designing, Building, and Deploying Messaging Solutions , 2003 .

[4]  Tom White,et al.  Hadoop: The Definitive Guide , 2009 .

[5]  Christian Esposito,et al.  Interconnecting Federated Clouds by Using Publish-Subscribe Service , 2013, Cluster Computing.

[6]  Wilhelm Hasselbring,et al.  Information system integration , 2000, CACM.

[7]  Douglas Crockford,et al.  The application/json Media Type for JavaScript Object Notation (JSON) , 2006, RFC.

[8]  Domenico Cotroneo,et al.  An Investigation on Flexible Communications in Publish/Subscribe Services , 2010, SEUS.

[9]  Daniel Dajun Zeng,et al.  Big Data Analytics: Perspective Shifting from Transactions to Ecosystems , 2013, IEEE Intell. Syst..

[10]  Daniel Dajun Zeng Social Computing: An AI Perspective , 2013, IEEE Intell. Syst..

[11]  Patrick Doherty,et al.  Stream Reasoning in DyKnow: A Knowledge Processing Middleware System , 2009 .

[12]  Anne-Marie Kermarrec,et al.  The many faces of publish/subscribe , 2003, CSUR.

[13]  Wu He,et al.  Integration of Distributed Enterprise Applications: A Survey , 2014, IEEE Transactions on Industrial Informatics.

[14]  Erhard Rahm,et al.  A survey of approaches to automatic schema matching , 2001, The VLDB Journal.

[15]  Fausto Giunchiglia,et al.  S-Match: an Algorithm and an Implementation of Semantic Matching , 2004, ESWS.

[16]  Philip Hunter Journey to the centre of big data , 2013 .

[17]  Jans Aasman Unification of geospatial reasoning, temporal logic, & social network analysis in event-based systems , 2008, DEBS.

[18]  Alessandro Margara,et al.  Processing flows of information: From data stream to complex event processing , 2012, CSUR.

[19]  Christoph Meinel,et al.  Towards Semantic Event-Driven Systems , 2009, 2009 3rd International Conference on New Technologies, Mobility and Security.

[20]  Phokion G. Kolaitis Schema mappings, data exchange, and metadata management , 2005, PODS '05.

[21]  Christos Doulkeridis,et al.  A survey of large-scale analytical query processing in MapReduce , 2013, The VLDB Journal.

[22]  A. Watson,et al.  OMG (Object Management Group) architecture and CORBA (common object request broker architecture) specification , 2002 .

[23]  Francesco Palmieri,et al.  Towards a federated Metropolitan Area Grid environment: The SCoPE network-aware infrastructure , 2010, Future Gener. Comput. Syst..

[24]  Jun Gao,et al.  An XML Publish/Subscribe Algorithm Implemented by Relational Operators , 2007, APWeb/WAIM.

[25]  Dan Brickley,et al.  Rdf vocabulary description language 1.0 : Rdf schema , 2004 .

[26]  Sebastian Rudolph,et al.  EP-SPARQL: a unified language for event processing and stream reasoning , 2011, WWW.

[27]  Naveen Erasala,et al.  Enterprise Application Integration in the electronic commerce world , 2003, Comput. Stand. Interfaces.

[28]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[29]  Michael Stonebraker,et al.  Aurora: a new model and architecture for data stream management , 2003, The VLDB Journal.

[30]  Domenico Cotroneo,et al.  On reliability in publish/subscribe services , 2013, Comput. Networks.

[31]  Alejandro Llaves,et al.  Semantic event processing in ENVISION , 2012, WIMS '12.