BayeSuites: An open web framework for massive Bayesian networks focused on neuroscience

Abstract BayeSuites is the first web framework for learning, visualizing, and interpreting Bayesian networks (BNs) that can scale to tens of thousands of nodes while providing fast and friendly user experience. All the necessary features that enable this are reviewed in this paper; these features include scalability, extensibility, interoperability, ease of use, and interpretability. Scalability is the key factor in learning and processing massive networks within reasonable time; for a maintainable software open to new functionalities, extensibility and interoperability are necessary. Ease of use and interpretability are fundamental aspects of model interpretation, fairly similar to the case of the recent explainable artificial intelligence trend. We present the capabilities of our proposed framework by highlighting a real example of a BN learned from genomic data obtained from Allen Institute for Brain Science. The extensibility properties of the software are also demonstrated with the help of our BN-based probabilistic clustering implementation, together with another genomic-data example.

[1]  Allan R. Jones,et al.  An anatomically comprehensive atlas of the adult human brain transcriptome , 2012, Nature.

[2]  Dirk Merkel,et al.  Docker: lightweight Linux containers for consistent development and deployment , 2014 .

[3]  Jean-Loup Guillaume,et al.  Fast unfolding of communities in large networks , 2008, 0803.0476.

[4]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[5]  C. N. Liu,et al.  Approximating discrete probability distributions with dependence trees , 1968, IEEE Trans. Inf. Theory.

[6]  Diogo M. Camacho,et al.  Wisdom of crowds for robust gene network inference , 2012, Nature Methods.

[7]  Changhe Yuan,et al.  Most Relevant Explanation in Bayesian Networks , 2011, J. Artif. Intell. Res..

[8]  Søren Højsgaard,et al.  Graphical Independence Networks with the gRain Package for R , 2012 .

[9]  Anders L. Madsen,et al.  The Hugin Tool for Probabilistic Graphical Models , 2005, Int. J. Artif. Intell. Tools.

[10]  P. Geurts,et al.  Inferring Regulatory Networks from Expression Data Using Tree-Based Methods , 2010, PloS one.

[11]  Clark Glymour,et al.  Estimating feedforward and feedback effective connections from fMRI time series: Assessments of statistical methods , 2019, Network Neuroscience.

[12]  Mitsuhiko Toda,et al.  Methods for Visual Understanding of Hierarchical System Structures , 1981, IEEE Transactions on Systems, Man, and Cybernetics.

[13]  Adrian E. Raftery,et al.  Model-Based Clustering, Discriminant Analysis, and Density Estimation , 2002 .

[14]  Clark Glymour,et al.  A million variables and more: the Fast Greedy Equivalence Search algorithm for learning high-dimensional graphical causal models, with an application to functional magnetic resonance images , 2016, International Journal of Data Science and Analytics.

[15]  Concha Bielza,et al.  Learning tractable Bayesian networks in the space of elimination orders , 2019, Artif. Intell..

[16]  Duc Truong Pham,et al.  Unsupervised training of Bayesian networks for data clustering , 2009, Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[17]  Fei Liu,et al.  Inference of Gene Regulatory Network Based on Local Bayesian Networks , 2016, PLoS Comput. Biol..

[18]  Marco Scutari,et al.  Learning Bayesian Networks with the bnlearn R Package , 2009, 0908.3817.

[19]  Michael Luby,et al.  Approximating Probabilistic Inference in Bayesian Belief Networks is NP-Hard , 1993, Artif. Intell..

[20]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[21]  Gregory F. Cooper,et al.  The Computational Complexity of Probabilistic Inference Using Bayesian Belief Networks , 1990, Artif. Intell..

[22]  Wes McKinney,et al.  Data Structures for Statistical Computing in Python , 2010, SciPy.

[23]  Chia-Wei Chen,et al.  OPATs: Omnibus P-value association tests , 2017, Briefings Bioinform..

[24]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[25]  M. Jacomy,et al.  ForceAtlas2, a Continuous Graph Layout Algorithm for Handy Network Visualization Designed for the Gephi Software , 2014, PloS one.

[26]  Clark Glymour,et al.  Non-Gaussian methods and high-pass filters in the estimation of effective connections , 2014, NeuroImage.

[27]  D. Walker,et al.  Mpi: a Standard Message Passing Interface 1 Mpi: a Standard Message Passing Interface , 1996 .

[28]  Constantin F. Aliferis,et al.  The max-min hill-climbing Bayesian network structure learning algorithm , 2006, Machine Learning.

[29]  Concha Bielza,et al.  Bayesian networks in neuroscience: a survey , 2014, Front. Comput. Neurosci..

[30]  Marek J. Druzdzel,et al.  SMILE: Structural Modeling, Inference, and Learning Engine and GeNIE: A Development Environment for Graphical Decision-Theoretic Models , 1999, AAAI/IAAI.

[31]  Gaël Varoquaux,et al.  The NumPy Array: A Structure for Efficient Numerical Computation , 2011, Computing in Science & Engineering.

[32]  Ole J. Mengshoel,et al.  Multi-focus and multi-window techniques for interactive network exploration , 2013, Electronic Imaging.

[33]  Concha Bielza,et al.  A Directional-Linear Bayesian Network and Its Application for Clustering and Simulation of Neural Somas , 2019, IEEE Access.

[34]  Judea Pearl,et al.  The recovery of causal poly-trees from statistical data , 1987, Int. J. Approx. Reason..

[35]  Marco Zaffalon,et al.  JNCC2: The Java Implementation Of Naive Credal Classifier 2 , 2008 .

[36]  Changhe Yuan,et al.  Importance sampling algorithms for Bayesian networks: Principles and performance , 2006, Math. Comput. Model..

[37]  Madhu Chetty,et al.  Improving gene regulatory network inference using network topology information. , 2015, Molecular bioSystems.

[38]  Carmen Lacave,et al.  Explanation of Bayesian Networks and Influence Diagrams in Elvira , 2007, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[39]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[40]  Abinash Panda,et al.  pgmpy: Probabilistic Graphical Models using Python , 2015, SciPy.

[41]  Anders L. Madsen,et al.  A parallel algorithm for Bayesian network structure learning from large data sets , 2017, Knowl. Based Syst..

[42]  Kevin Murphy,et al.  Bayes net toolbox for Matlab , 1999 .

[43]  Núria Queralt-Rosinach,et al.  DisGeNET: a comprehensive platform integrating information on human disease-associated genes and variants , 2016, Nucleic Acids Res..

[44]  Edward M. Reingold,et al.  Graph drawing by force‐directed placement , 1991, Softw. Pract. Exp..

[45]  Qing Zhou,et al.  Learning Large-Scale Bayesian Networks with the sparsebn Package , 2017, Journal of Statistical Software.