Towards Building Deep Networks with Bayesian Factor Graphs

We propose a Multi-Layer Network based on the Bayesian framework of the Factor Graphs in Reduced Normal Form (FGrn) applied to a two-dimensional lattice. The Latent Variable Model (LVM) is the basic building block of a quadtree hierarchy built on top of a bottom layer of random variables that represent pixels of an image, a feature map, or more generally a collection of spatially distributed discrete variables. The multi-layer architecture implements a hierarchical data representation that, via belief propagation, can be used for learning and inference. Typical uses are pattern completion, correction and classification. The FGrn paradigm provides great flexibility and modularity and appears as a promising candidate for building deep networks: the system can be easily extended by introducing new and different (in cardinality and in type) variables. Prior knowledge, or supervised information, can be introduced at different scales. The FGrn paradigm provides a handy way for building all kinds of architectures by interconnecting only three types of units: Single Input Single Output (SISO) blocks, Sources and Replicators. The network is designed like a circuit diagram and the belief messages flow bidirectionally in the whole system. The learning algorithms operate only locally within each block. The framework is demonstrated in this paper in a three-layer structure applied to images extracted from a standard data set.

[1]  Robert Nowak,et al.  Multiscale Hidden Markov Models for Bayesian Image Analysis , 1999 .

[2]  H.-A. Loeliger,et al.  An introduction to factor graphs , 2004, IEEE Signal Processing Magazine.

[3]  Viktor K. Prasanna,et al.  Task Parallel Implementation of Belief Propagation in Factor Graphs , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum.

[4]  Kevin P. Murphy,et al.  Machine learning - a probabilistic perspective , 2012, Adaptive computation and machine learning series.

[5]  Yoshua Bengio,et al.  On the Expressive Power of Deep Architectures , 2011, ALT.

[6]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[7]  Adrian F. M. Smith,et al.  Sampling-Based Approaches to Calculating Marginal Densities , 1990 .

[8]  Francesco Palmieri,et al.  Belief propagation and learning in convolution multi-layer factor graphs , 2014, 2014 4th International Workshop on Cognitive Information Processing (CIP).

[9]  Yoshua Bengio,et al.  Extracting and composing robust features with denoising autoencoders , 2008, ICML '08.

[10]  X. Jin Factor graphs and the Sum-Product Algorithm , 2002 .

[11]  Christian Wolf,et al.  Inference and parameter estimation on hierarchical belief networks for image segmentation , 2010, Neurocomputing.

[12]  G. B. Smith,et al.  Preface to S. Geman and D. Geman, “Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images” , 1987 .

[13]  Peter C. Cheeseman,et al.  Bayesian Classification (AutoClass): Theory and Results , 1996, Advances in Knowledge Discovery and Data Mining.

[14]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Thomas Serre,et al.  A neuromorphic approach to computer vision , 2010, Commun. ACM.

[16]  Donald Geman,et al.  Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images , 1984 .

[17]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[18]  David J. Field,et al.  Emergence of simple-cell receptive field properties by learning a sparse code for natural images , 1996, Nature.

[19]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[20]  Thomas Hofmann,et al.  Efficient Learning of Sparse Representations with an Energy-Based Model , 2007 .

[21]  Fei Wang,et al.  Multilevel Belief Propagation for Fast Inference on Markov Random Fields , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[22]  Jürgen Schmidhuber,et al.  Deep learning in neural networks: An overview , 2014, Neural Networks.

[23]  W. Clem Karl,et al.  Multiscale representations of Markov random fields , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[24]  Charles A. Bouman,et al.  A multiscale random field model for Bayesian image segmentation , 1994, IEEE Trans. Image Process..

[25]  Patrick Pérez,et al.  Discrete Markov image modeling and inference on the quadtree , 2000, IEEE Trans. Image Process..

[26]  Christopher M. Bishop Latent Variable Models , 1998, Learning in Graphical Models.

[27]  Michael I. Jordan,et al.  An Introduction to Variational Methods for Graphical Models , 1999, Machine Learning.

[28]  Francesco Palmieri,et al.  Simulink Implementation of Belief Propagation in Normal Factor Graphs , 2015, Advances in Neural Networks.

[29]  Thomas Hofmann,et al.  Greedy Layer-Wise Training of Deep Networks , 2007 .

[30]  Tengfei Liu,et al.  A Survey on Latent Tree Models and Applications , 2013, J. Artif. Intell. Res..

[31]  A. Willsky Multiresolution Markov models for signal and image processing , 2002, Proc. IEEE.

[32]  Dileep George,et al.  How the brain might work: a hierarchical and temporal model for learning and recognition , 2008 .

[33]  N. Ma Task Parallel Implementation of Belief Propagation in Factor Graphs , 2012 .

[34]  Matthew J. Beal Variational algorithms for approximate Bayesian inference , 2003 .

[35]  Andrew Y. Ng,et al.  Learning Feature Representations with K-Means , 2012, Neural Networks: Tricks of the Trade.

[36]  Thomas G. Dietterich,et al.  Editors. Advances in Neural Information Processing Systems , 2002 .

[37]  Liang-Gee Chen,et al.  Hardware-Efficient Belief Propagation , 2009, IEEE Transactions on Circuits and Systems for Video Technology.

[38]  Vincent Y. F. Tan,et al.  Learning Latent Tree Graphical Models , 2010, J. Mach. Learn. Res..

[39]  Olga Veksler,et al.  Fast approximate energy minimization via graph cuts , 2001, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[40]  Olga Veksler,et al.  Fast Approximate Energy Minimization via Graph Cuts , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[41]  Anjul Patney,et al.  Efficient computation of sum-products on GPUs through software-managed cache , 2008, ICS '08.

[42]  David Barber,et al.  Bayesian reasoning and machine learning , 2012 .

[43]  Michael I. Jordan,et al.  Graphical Models, Exponential Families, and Variational Inference , 2008, Found. Trends Mach. Learn..

[44]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[45]  Francesco Palmieri,et al.  A Comparison of Algorithms for Learning Hidden Variables in Normal Graphs , 2013, ArXiv.

[46]  Nir Friedman,et al.  Probabilistic Graphical Models - Principles and Techniques , 2009 .

[47]  G. Forney,et al.  Codes on graphs: normal realizations , 2000, 2000 IEEE International Symposium on Information Theory (Cat. No.00CH37060).

[48]  William T. Freeman,et al.  Constructing free-energy approximations and generalized belief propagation algorithms , 2005, IEEE Transactions on Information Theory.

[49]  Honglak Lee,et al.  Sparse deep belief net model for visual area V2 , 2007, NIPS.

[50]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.