BayeSuites: An open web framework for massive Bayesian networks focused on neuroscience

BayeSuites is the first web framework for learning, visualizing, and interpreting Bayesian networks (BNs) that can scale to tens of thousands of nodes while providing fast and friendly user experience. All the necessary features that enable this are reviewed in this paper; these features include scalability, extensibility, interoperability, ease of use, and interpretability. Scalability is the key factor in learning and processing massive networks within reasonable time; for a maintainable software open to new functionalities, extensibility and interoperability are necessary. Ease of use and interpretability are fundamental aspects of model interpretation, fairly similar to the case of the recent explainable artificial intelligence trend. We present the capabilities of our proposed framework by highlighting a real example of a BN learned from genomic data obtained from Allen Institute for Brain Science. The extensibility properties of the software are also demonstrated with the help of our BN-based probabilistic clustering implementation, together with another genomic-data example.

[1]  Allan R. Jones,et al.  An anatomically comprehensive atlas of the adult human brain transcriptome , 2012, Nature.

[2]  Eric Jones,et al.  SciPy: Open Source Scientific Tools for Python , 2001 .

[3]  Diogo M. Camacho,et al.  Wisdom of crowds for robust gene network inference , 2012, Nature Methods.

[4]  Adrian E. Raftery,et al.  Model-Based Clustering, Discriminant Analysis, and Density Estimation , 2002 .

[5]  Sridevi Polavaram,et al.  Win–win data sharing in neuroscience , 2017, Nature Methods.

[6]  Stuart J. Russell,et al.  Dynamic bayesian networks: representation, inference and learning , 2002 .

[7]  Fei Liu,et al.  Inference of Gene Regulatory Network Based on Local Bayesian Networks , 2016, PLoS Comput. Biol..

[8]  Madhu Chetty,et al.  Improving gene regulatory network inference using network topology information. , 2015, Molecular bioSystems.

[9]  Lydia Ng,et al.  Allen Brain Atlas: an integrated spatio-temporal portal for exploring the central nervous system , 2012, Nucleic Acids Res..

[10]  T. Koski,et al.  A Review of Bayesian Networks and Structure Learning , 2012 .

[11]  C. N. Liu,et al.  Approximating discrete probability distributions with dependence trees , 1968, IEEE Trans. Inf. Theory.

[12]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[13]  Dirk Merkel,et al.  Docker: lightweight Linux containers for consistent development and deployment , 2014 .

[14]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[15]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[16]  W. Walker,et al.  Mpi: a Standard Message Passing Interface 1 Mpi: a Standard Message Passing Interface , 1996 .

[17]  C. Tenopir,et al.  Data Sharing by Scientists: Practices and Perceptions , 2011, PloS one.

[18]  Klaus-Robert Müller,et al.  Explainable artificial intelligence , 2017 .

[19]  Søren Højsgaard,et al.  Graphical Independence Networks with the gRain Package for R , 2012 .

[20]  Changhe Yuan,et al.  Importance sampling algorithms for Bayesian networks: Principles and performance , 2006, Math. Comput. Model..

[21]  Wes McKinney,et al.  Data Structures for Statistical Computing in Python , 2010, SciPy.

[22]  Anders L. Madsen,et al.  A parallel algorithm for Bayesian network structure learning from large data sets , 2017, Knowl. Based Syst..

[23]  Concha Bielza,et al.  Bayesian networks in neuroscience: a survey , 2014, Front. Comput. Neurosci..

[24]  Cynthia Rudin,et al.  Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead , 2018, Nature Machine Intelligence.

[25]  Abinash Panda,et al.  pgmpy: Probabilistic Graphical Models using Python , 2015, SciPy.

[26]  Concha Bielza,et al.  A Directional-Linear Bayesian Network and Its Application for Clustering and Simulation of Neural Somas , 2019, IEEE Access.

[27]  John N. Weinstein,et al.  ElemCor: accurate data analysis and enrichment calculation for high-resolution LC-MS stable isotope labeling experiments , 2019, BMC Bioinformatics.

[28]  Duc Truong Pham,et al.  Unsupervised training of Bayesian networks for data clustering , 2009, Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[29]  Michael Luby,et al.  Approximating Probabilistic Inference in Bayesian Belief Networks is NP-Hard , 1993, Artif. Intell..

[30]  Gregory F. Cooper,et al.  The Computational Complexity of Probabilistic Inference Using Bayesian Belief Networks , 1990, Artif. Intell..

[31]  Changhe Yuan,et al.  Most Relevant Explanation in Bayesian Networks , 2011, J. Artif. Intell. Res..

[32]  Concha Bielza,et al.  Learning tractable Bayesian networks in the space of elimination orders , 2019, Artif. Intell..

[33]  Concha Bielza,et al.  Learning massive interpretable gene regulatory networks of the human brain by merging Bayesian Networks , 2020, bioRxiv.

[34]  Padraig Gleeson,et al.  Geppetto: a reusable modular open platform for exploring neuroscience data and models , 2018, Philosophical Transactions of the Royal Society B: Biological Sciences.

[35]  Marco Scutari,et al.  Learning Bayesian Networks with the bnlearn R Package , 2009, 0908.3817.

[36]  Gaël Varoquaux,et al.  The NumPy Array: A Structure for Efficient Numerical Computation , 2011, Computing in Science & Engineering.

[37]  J. van Leeuwen,et al.  Drawing Graphs , 2001, Lecture Notes in Computer Science.

[38]  Carmen Lacave,et al.  Explanation of Bayesian Networks and Influence Diagrams in Elvira , 2007, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[39]  M. Jacomy,et al.  ForceAtlas2, a Continuous Graph Layout Algorithm for Handy Network Visualization Designed for the Gephi Software , 2014, PloS one.

[40]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[41]  Núria Queralt-Rosinach,et al.  DisGeNET: a comprehensive platform integrating information on human disease-associated genes and variants , 2016, Nucleic Acids Res..

[42]  J. Wicherts,et al.  Willingness to Share Research Data Is Related to the Strength of the Evidence and the Quality of Reporting of Statistical Results , 2011, PloS one.

[43]  Qing Zhou,et al.  Learning Large-Scale Bayesian Networks with the sparsebn Package , 2017, Journal of Statistical Software.

[44]  Marek J. Druzdzel,et al.  SMILE: Structural Modeling, Inference, and Learning Engine and GeNIE: A Development Environment for Graphical Decision-Theoretic Models , 1999, AAAI/IAAI.

[45]  Constantin F. Aliferis,et al.  The max-min hill-climbing Bayesian network structure learning algorithm , 2006, Machine Learning.

[46]  P. Geurts,et al.  Inferring Regulatory Networks from Expression Data Using Tree-Based Methods , 2010, PloS one.

[47]  Judea Pearl,et al.  The recovery of causal poly-trees from statistical data , 1987, Int. J. Approx. Reason..

[48]  Seth Flaxman,et al.  European Union Regulations on Algorithmic Decision-Making and a "Right to Explanation" , 2016, AI Mag..

[49]  Mitsuhiko Toda,et al.  Methods for Visual Understanding of Hierarchical System Structures , 1981, IEEE Transactions on Systems, Man, and Cybernetics.

[50]  Gábor Csárdi,et al.  The igraph software package for complex network research , 2006 .

[51]  Edward M. Reingold,et al.  Graph drawing by force‐directed placement , 1991, Softw. Pract. Exp..

[52]  Jean-Loup Guillaume,et al.  Fast unfolding of communities in large networks , 2008, 0803.0476.

[53]  Anders L. Madsen,et al.  The Hugin Tool for Probabilistic Graphical Models , 2005, Int. J. Artif. Intell. Tools.

[54]  Concha Bielza,et al.  Data Publications Correlate with Citation Impact , 2016, Front. Neurosci..

[55]  Kristofer E. Bouchard,et al.  High-Performance Computing in Neuroscience for Data-Driven Discovery, Integration, and Dissemination , 2016, Neuron.

[56]  Ole J. Mengshoel,et al.  Multi-focus and multi-window techniques for interactive network exploration , 2013, Electronic Imaging.