Predicting Missing Provenance Using Semantic Associations in Reservoir Engineering

Provenance is becoming an important issue as a reliable estimator of data quality. However, provenance collection mechanisms in the reservoir engineering domain often result in missing provenance information. In this paper, we address the problem of predicting missing provenance information in reservoir engineering. Based on the observation that data items with specific semantic "connections" may share the same provenance, our approach annotates data items with domain entities defined in a domain ontology, and represent these "connections" as sequences of relationships (also known as semantic associations) in the ontology graph. By analyzing annotated historical datasets with complete provenance information, we capture semantic associations that may imply identical provenance. A statistical analysis is applied to assign confidence values to the discovered associations, which indicate the trust of each association when it is used for future provenance prediction. The semantic associations, along with their confidence measures, are then used by a voting algorithm to predict the missing provenance information. Our evaluation shows that the average precision of our approach is above 85% when one third of the provenance information is missing.

[1]  Yogesh L. Simmhan,et al.  The Open Provenance Model core specification (v1.1) , 2011, Future Gener. Comput. Syst..

[2]  Viktor K. Prasanna,et al.  A Semantic Framework for Integrated Asset Management in Smart Oilfields , 2007, Seventh IEEE International Symposium on Cluster Computing and the Grid (CCGrid '07).

[3]  Amit P. Sheth,et al.  Context-Aware Semantic Association Ranking , 2003, SWDB.

[4]  Sanjeev Khanna,et al.  Data Provenance: Some Basic Issues , 2000, FSTTCS.

[5]  Kaizar Amin,et al.  Metadata in the Collaboratory for Multi-Scale Chemical Science , 2003, Dublin Core Conference.

[6]  Jerry M. Mendel,et al.  Forecasting Injector/Producer Relationships From Production and Injection Rates Using an Extended Kalman Filter , 2009 .

[7]  Bertram Ludäscher,et al.  Efficient provenance storage over nested data collections , 2009, EDBT '09.

[8]  Gerhard Weikum,et al.  NAGA: harvesting, searching and ranking knowledge , 2008, SIGMOD Conference.

[9]  Paul T. Groth,et al.  The requirements of recording and using provenance in e- Science experiments , 2005 .

[10]  Robert Stevens,et al.  Annotating, Linking and Browsing Provenance Logs for {e-Science} , 2003 .

[11]  Amit P. Sheth,et al.  Ρ-Queries: enabling querying for semantic associations on the semantic web , 2003, WWW '03.

[12]  M. F. Hawkins,et al.  Applied Petroleum Reservoir Engineering , 1991 .

[13]  James Frew,et al.  Earth System Science Workbench: a data management infrastructure for earth science products , 2001, Proceedings Thirteenth International Conference on Scientific and Statistical Database Management. SSDBM 2001.

[14]  Jennifer Widom,et al.  Panda: A System for Provenance and Data , 2010, IEEE Data Eng. Bull..

[15]  Yogesh L. Simmhan,et al.  A survey of data provenance in e-science , 2005, SGMD.

[16]  Amit P. Sheth,et al.  Ontology-Driven Provenance Management in eScience: An Application in Parasite Research , 2009, OTM Conferences.

[17]  Bertram Ludäscher,et al.  Techniques for efficiently querying scientific workflow provenance graphs , 2010, EDBT '10.

[18]  Amit P. Sheth,et al.  Semantic Provenance for eScience: Managing the Deluge of Scientific Data , 2008, IEEE Internet Computing.

[19]  Amit P. Sheth,et al.  Ranking Documents Semantically Using Ontological Relationships , 2010, 2010 IEEE Fourth International Conference on Semantic Computing.

[20]  Larry W. Lake,et al.  The use of capacitance–resistance models for rapid estimation of waterflood performance and optimization , 2009 .

[21]  Amit P. Sheth,et al.  Ranking complex relationships on the semantic Web , 2005, IEEE Internet Computing.

[22]  H. V. Jagadish,et al.  Database management for life sciences research , 2004, SGMD.

[23]  James Cheney,et al.  Provenance management in curated databases , 2006, SIGMOD Conference.

[24]  Ugur Demiryurek,et al.  Neural-Network based Sensitivity Analysis for Injector-Producer Relationship Identification , 2008 .

[25]  Carole A. Goble,et al.  Using Semantic Web Technologies for Representing E-science Provenance , 2004, SEMWEB.

[26]  Yong Zhao,et al.  Chimera: a virtual data system for representing, querying, and automating data derivation , 2002, Proceedings 14th International Conference on Scientific and Statistical Database Management.

[27]  Viktor K. Prasanna,et al.  Semantic web technologies for smart oil field applications , 2008 .

[28]  Gerhard Weikum,et al.  NAGA: Searching and Ranking Knowledge , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[29]  Thomas Heinis,et al.  Efficient lineage tracking for scientific workflows , 2008, SIGMOD Conference.

[30]  Adriane Chapman,et al.  Efficient provenance storage , 2008, SIGMOD Conference.

[31]  Amit P. Sheth,et al.  SemRank: ranking complex relationship search results on the semantic web , 2005, WWW '05.

[32]  Deborah L. McGuinness,et al.  Knowledge Provenance Infrastructure , 2003, IEEE Data Eng. Bull..

[33]  Iraj Ershaghi,et al.  Identifying Injector-Producer Relationship in Waterflood Using Hybrid Constrained Nonlinear Optimization , 2010 .