A Semantic-Based Approach for Handling Incomplete and Inaccurate Provenance in Reservoir Engineering

Provenance is becoming an important issue as a reliable estimator of data quality. However, provenance collection mechanisms in the reservoir engineering domain often result in incomplete provenance information. In this paper, we address the problem of predicting missing provenance information in reservoir engineering. Based on the observation that data items with specific semantic “connections” may share the same provenance, our approach annotates data items with domain entities defined in a domain ontology, and represent these “connections” as sequences of relationships (also known as semantic associations) in the ontology graph. By analyzing annotated historical datasets with complete provenance information, we capture semantic associations that may imply identical provenance. A statistical analysis is applied to assign probability values to the discovered associations, which indicate the confidence of each association when it is used for future provenance prediction. We develop a voting algorithm which utilizes the semantic associations and their confidence measures to predict the missing provenance information. Because the existing provenance information can be incorrect due to errors during the manual provenance annotation procedure, as an extension of the voting algorithm, we further design an algorithm for prediction which takes into account both the confidence measures of semantic associations and the accuracy of the existing provenance. A probability value is calculated as the trust of each prediction result. We develop the ProPSA (Provenance Prediction based on Semantic Associations) system which uses our proposed approaches to handle incomplete and inaccurate provenance information in reservoir engineering. Our evaluation shows that the average precision of our approach is above 85% when one-third of the provenance information is missing.

[1]  Carole A. Goble,et al.  Using Semantic Web Technologies for Representing E-science Provenance , 2004, SEMWEB.

[2]  Paul T. Groth,et al.  The requirements of recording and using provenance in e- Science experiments , 2005 .

[3]  Robert Stevens,et al.  Annotating, Linking and Browsing Provenance Logs for {e-Science} , 2003 .

[4]  Amit P. Sheth,et al.  SemRank: ranking complex relationship search results on the semantic web , 2005, WWW '05.

[5]  Viktor K. Prasanna,et al.  A Semantic Framework for Integrated Asset Management in Smart Oilfields , 2007, Seventh IEEE International Symposium on Cluster Computing and the Grid (CCGrid '07).

[6]  Yong Zhao,et al.  Chimera: a virtual data system for representing, querying, and automating data derivation , 2002, Proceedings 14th International Conference on Scientific and Statistical Database Management.

[7]  Amit P. Sheth,et al.  Semantic Provenance for eScience: Managing the Deluge of Scientific Data , 2008, IEEE Internet Computing.

[8]  Ben Taskar,et al.  Introduction to Statistical Relational Learning (Adaptive Computation and Machine Learning) , 2007 .

[9]  Ugur Demiryurek,et al.  Neural-Network based Sensitivity Analysis for Injector-Producer Relationship Identification , 2008 .

[10]  Bertram Ludäscher,et al.  Efficient provenance storage over nested data collections , 2009, EDBT '09.

[11]  Kaizar Amin,et al.  Metadata in the Collaboratory for Multi-Scale Chemical Science , 2003, Dublin Core Conference.

[12]  Jennifer Widom,et al.  Panda: A System for Provenance and Data , 2010, IEEE Data Eng. Bull..

[13]  Jerry M. Mendel,et al.  Forecasting Injector/Producer Relationships From Production and Injection Rates Using an Extended Kalman Filter , 2009 .

[14]  Yogesh L. Simmhan,et al.  A survey of data provenance in e-science , 2005, SGMD.

[15]  H. V. Jagadish,et al.  Database management for life sciences research , 2004, SGMD.

[16]  Amit P. Sheth,et al.  Ρ-Queries: enabling querying for semantic associations on the semantic web , 2003, WWW '03.

[17]  James Frew,et al.  Earth System Science Workbench: a data management infrastructure for earth science products , 2001, Proceedings Thirteenth International Conference on Scientific and Statistical Database Management. SSDBM 2001.

[18]  Amit P. Sheth,et al.  Context-Aware Semantic Association Ranking , 2003, SWDB.

[19]  Larry W. Lake,et al.  The use of capacitance–resistance models for rapid estimation of waterflood performance and optimization , 2009 .

[20]  Adriane Chapman,et al.  Efficient provenance storage , 2008, SIGMOD Conference.

[21]  Deborah L. McGuinness,et al.  Knowledge Provenance Infrastructure , 2003, IEEE Data Eng. Bull..

[22]  D. Havlena,et al.  The Material Balance as an Equation of a Straight Line , 1963 .

[23]  James Cheney,et al.  Provenance management in curated databases , 2006, SIGMOD Conference.

[24]  Iraj Ershaghi,et al.  Identifying Injector-Producer Relationship in Waterflood Using Hybrid Constrained Nonlinear Optimization , 2010 .

[25]  Yogesh L. Simmhan,et al.  The Open Provenance Model core specification (v1.1) , 2011, Future Gener. Comput. Syst..

[26]  Sanjeev Khanna,et al.  Data Provenance: Some Basic Issues , 2000, FSTTCS.

[27]  Amit P. Sheth,et al.  Ontology-Driven Provenance Management in eScience: An Application in Parasite Research , 2009, OTM Conferences.

[28]  Bertram Ludäscher,et al.  Techniques for efficiently querying scientific workflow provenance graphs , 2010, EDBT '10.

[29]  Amit P. Sheth,et al.  Ranking Documents Semantically Using Ontological Relationships , 2010, 2010 IEEE Fourth International Conference on Semantic Computing.

[30]  Abraham Bernstein,et al.  Adding Data Mining Support to SPARQL Via Statistical Relational Learning Methods , 2008, ESWC.

[31]  Amit P. Sheth,et al.  Ranking complex relationships on the semantic Web , 2005, IEEE Internet Computing.

[32]  Robert E. Tarjan,et al.  Fast Algorithms for Solving Path Problems , 1981, JACM.

[33]  Viktor K. Prasanna,et al.  Semantic web technologies for smart oil field applications , 2008 .

[34]  D. Tehrani,et al.  An Analysis of a Volumetric Balance Equation for Calculation of Oil in Place and Water Influx , 1985 .

[35]  Gerhard Weikum,et al.  NAGA: Searching and Ranking Knowledge , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[36]  Ehsan Nourafkan,et al.  Calculation OOIP in oil reservoir by pressure matching method using genetic algorithm , 2009 .

[37]  M. F. Hawkins,et al.  Applied Petroleum Reservoir Engineering , 1991 .