Automated extraction of fragments of Bayesian networks from textual sources

Abstract Mining large amounts of unstructured data for extracting meaningful, accurate, and actionable information, is at the core of a variety of research disciplines including computer science, mathematical and statistical modelling, as well as knowledge engineering. In particular, the ability to model complex scenarios based on unstructured datasets is an important step towards an integrated and accurate knowledge extraction approach. This would provide a significant insight in any decision making process driven by Big Data analysis activities. However, there are multiple challenges that need to be fully addressed in order to achieve this, especially when large and unstructured data sets are considered. In this article we propose and analyse a novel method to extract and build fragments of Bayesian networks (BNs) from unstructured large data sources. The results of our analysis show the potential of our approach, and highlight its accuracy and efficiency. More specifically, when compared with existing approaches, our method addresses specific challenges posed by the automated extraction of BNs with extensive applications to unstructured and highly dynamic data sources. The aim of this work is to advance the current state-of-the-art approaches to the automated extraction of BNs from unstructured datasets, which provide a versatile and powerful modelling framework to facilitate knowledge discovery in complex decision scenarios.

[1]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[2]  Maurice Mars,et al.  Global e-health policy: a work in progress. , 2010, Health affairs.

[3]  Bob DuCharme,et al.  Learning SPARQL , 2013 .

[4]  Christopher D. Manning,et al.  Generating Typed Dependency Parses from Phrase Structure Parses , 2006, LREC.

[5]  Bartek Wilczynski,et al.  BNFinder2: Faster Bayesian network learning and Bayesian classification , 2013, Bioinform..

[6]  Judea Pearl,et al.  Bayesian Networks , 1998, Encyclopedia of Social Network Analysis and Mining. 2nd Ed..

[7]  Ewan Klein,et al.  Natural Language Processing with Python , 2009 .

[8]  Benjamin Kuipers,et al.  Causal Reasoning in Medicine: Analysis of a Protocol , 1984, Cogn. Sci..

[9]  Nicandro Cruz-Ramírez,et al.  Discovering interobserver variability in the cytodiagnosis of breast cancer using decision trees and Bayesian networks , 2009, Appl. Soft Comput..

[10]  Ron S. Kenett,et al.  Encyclopedia of statistics in quality and reliability , 2007 .

[11]  Shang Gao,et al.  Integrating text mining, data mining, and network analysis for identifying genetic breast cancer trends , 2016, BMC Research Notes.

[12]  Lise Getoor,et al.  Relationship Identification for Social Network Discovery , 2007, AAAI.

[13]  Yuni Xia,et al.  Bridging Text Mining and Bayesian Networks , 2009, 2009 International Conference on Network-Based Information Systems.

[14]  Martin Theobald,et al.  Extraction of Conditional Probabilities of the Relationships Between Drugs, Diseases, and Genes from PubMed Guided by Relationships in PharmGKB , 2009, Summit on translational bioinformatics.

[15]  Chunnian Liu,et al.  A hybrid method for learning Bayesian networks based on ant colony optimization , 2011, Appl. Soft Comput..

[16]  Benjamin Kuipers,et al.  Commonsense Reasoning about Causality: Deriving Behavior from Structure , 1984, Artif. Intell..

[17]  Olivier Bodenreider,et al.  A mutation-centric approach to identifying pharmacogenomic relations in text , 2012, J. Biomed. Informatics.

[18]  Albert-László Barabási,et al.  Statistical mechanics of complex networks , 2001, ArXiv.

[19]  George Kingsley Zipf,et al.  Human behavior and the principle of least effort , 1949 .

[20]  Nik Bessis,et al.  Extraction, Identification, and Ranking of Network Structures from Data Sets , 2014, 2014 Eighth International Conference on Complex, Intelligent and Software Intensive Systems.

[21]  Benjamin Kuipers,et al.  Causal Reasoning in Medicine: Analysis of a Protocol , 1984 .

[22]  Duncan J. Watts,et al.  Collective dynamics of ‘small-world’ networks , 1998, Nature.

[23]  Fan Yu,et al.  Towards Extracting Drug-Effect Relation from Twitter: A Supervised Learning Approach , 2016, 2016 IEEE 2nd International Conference on Big Data Security on Cloud (BigDataSecurity), IEEE International Conference on High Performance and Smart Computing (HPSC), and IEEE International Conference on Intelligent Data and Security (IDS).

[24]  Guillaume Cleuziou,et al.  Mapping General-Specific Noun Relationships to WordNet Hypernym/Hyponym Relations , 2008, EKAW.

[25]  Rong Xu,et al.  A knowledge-driven conditional approach to extract pharmacogenomics specific drug-gene relationships from free text , 2012, J. Biomed. Informatics.

[26]  Kezhi Mao,et al.  Multi level causal relation identification using extended features , 2014, Expert Syst. Appl..

[27]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[28]  Nik Bessis,et al.  An influence assessment method based on co-occurrence for topologically reduced big data sets , 2016, Soft Comput..

[29]  Massimo Poesio,et al.  Acquiring Bayesian Networks from Text , 2004, LREC.

[30]  Bogdan Sacaleanu,et al.  Risk Event and Probability Extraction for Modeling Medical Risks , 2014, AAAI Fall Symposia.

[31]  Richard Hill,et al.  A Kuramoto Model Based Approach to Extract and Assess Influence Relations , 2015, ISICA.

[32]  Finn Verner Jensen,et al.  Bayesian networks , 1998, Data Mining and Knowledge Discovery Handbook.

[33]  Raghu Ramakrishnan,et al.  Managing information extraction: state of the art and research directions , 2006, SIGMOD Conference.

[34]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.