Generalized Alpha-Beta Divergences and Their Application to Robust Nonnegative Matrix Factorization

We propose a class of multiplicative algorithms for Nonnegative Matrix Factorization (NMF) which are robust with respect to noise and outliers. To achieve this, we formulate a new family generalized divergences referred to as the Alpha-Beta-divergences (AB-divergences), which are parameterized by the two tuning parameters, alpha and beta, and smoothly connect the fundamental Alpha-, Beta- and Gamma-divergences. By adjusting these tuning parameters, we show that a wide range of standard and new divergences can be obtained. The corresponding learning algorithms for NMF are shown to integrate and generalize many existing ones, including the Lee-Seung, ISRA (Image Space Reconstruction Algorithm), EMML (Expectation Maximization Maximum Likelihood), Alpha-NMF, and Beta-NMF. Owing to more degrees of freedom in tuning the parameters, the proposed family of AB-multiplicative NMF algorithms is shown to improve robustness with respect to noise and outliers. The analysis illuminates the links of between AB-divergence and other divergences, especially Gamma- and Itakura-Saito divergences.

[1]  S. M. Ali,et al.  A General Class of Coefficients of Divergence of One Distribution from Another , 1966 .

[2]  L. Bregman The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming , 1967 .

[3]  L. Shepp,et al.  Maximum Likelihood Reconstruction for Emission Tomography , 1983, IEEE Transactions on Medical Imaging.

[4]  Shun-ichi Amari,et al.  Differential-geometrical methods in statistics , 1985 .

[5]  M. Daube-Witherspoon,et al.  An Iterative Image Space Reconstruction Algorthm Suitable for Volume ECT , 1986, IEEE Transactions on Medical Imaging.

[6]  Robert M. Lewitt,et al.  Accelerated Iterative Reconstruction for Positron Emission Tomography Based on the EM Algorithm for Maximum Likelihood Estimation , 1986, IEEE Transactions on Medical Imaging.

[7]  Shun-ichi Amari,et al.  Dualistic geometry of the manifold of higher-order neurons , 1991, Neural Networks.

[8]  A. R. De Pierro,et al.  On the relation between the ISRA and the EM algorithm for positron emission tomography , 1993, IEEE Trans. Medical Imaging.

[9]  Linda Kaufman,et al.  Maximum likelihood, least squares, and penalized least squares for PET , 1993, IEEE Trans. Medical Imaging.

[10]  Charles L. Byrne,et al.  Signal Processing: A Mathematical Approach , 1993 .

[11]  P. Paatero,et al.  Positive matrix factorization: A non-negative factor model with optimal utilization of error estimates of data values† , 1994 .

[12]  Alvaro R. De Pierro,et al.  A modified expectation maximization algorithm for penalized likelihood estimation in emission tomography , 1995, IEEE Trans. Medical Imaging.

[13]  Huaiyu Zhu,et al.  Measurements of generalisation based on information geometry , 1997 .

[14]  Manfred K. Warmuth,et al.  Exponentiated Gradient Versus Gradient Descent for Linear Predictors , 1997, Inf. Comput..

[15]  M. C. Jones,et al.  Robust and efficient estimation by minimising a density power divergence , 1998 .

[16]  Charles L. Byrne,et al.  Accelerating the EMML algorithm and related iterative algorithms by rescaled block-iterative methods , 1998, IEEE Trans. Image Process..

[17]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[18]  Michel E. B. Yamagishi,et al.  Fast iterative methods applied to tomography models with general Gibbs priors , 1999, Optics & Photonics.

[19]  H. Lantéri,et al.  COMPARISON BETWEEN ISRA AND RLA ALGORITHMS. USE OF A WIENER FILTER BASED STOPPING CRITERION , 1999 .

[20]  J. Lafferty Additive models, boosting, and inference for generalized divergences , 1999, COLT '99.

[21]  Shun-ichi Amari,et al.  Methods of information geometry , 2000 .

[22]  M. C. Jones,et al.  A Comparison of related density-based minimum divergence estimators , 2001 .

[23]  Mihoko Minami,et al.  Robust Blind Source Separation by Beta Divergence , 2002, Neural Computation.

[24]  H. Lantéri,et al.  Penalized maximum likelihood image restoration with positivity constraints:multiplicative algorithms , 2002 .

[25]  Takafumi Kanamori,et al.  Information Geometry of U-Boost and Bregman Divergence , 2004, Neural Computation.

[26]  Jun Zhang,et al.  Divergence Function, Duality, and Convex Analysis , 2004, Neural Computation.

[27]  Inderjit S. Dhillon,et al.  Generalized Nonnegative Matrix Approximations with Bregman Divergences , 2005, NIPS.

[28]  Matthias Hein,et al.  Hilbertian Metrics and Positive Definite Kernels on Probability Measures , 2005, AISTATS.

[29]  Thomas P. Minka,et al.  Divergence measures and message passing , 2005 .

[30]  Inderjit S. Dhillon,et al.  Clustering with Bregman Divergences , 2005, J. Mach. Learn. Res..

[31]  Zhaoshui He,et al.  Extended SMART Algorithms for Non-negative Matrix Factorization , 2006, ICAISC.

[32]  Andrzej Cichocki,et al.  Csiszár's Divergences for Non-negative Matrix Factorization: Family of New Algorithms , 2006, ICA.

[33]  Huaiyu Zhu,et al.  Bayesian invariant measurements of generalization , 1995, Neural Processing Letters.

[34]  Richard Nock,et al.  On Bregman Voronoi diagrams , 2007, SODA '07.

[35]  Yu Fujimoto,et al.  A modified EM algorithm for mixture models based on Bregman divergence , 2007 .

[36]  Raul Kompass,et al.  A Generalized Divergence Measure for Nonnegative Matrix Factorization , 2007, Neural Computation.

[37]  Andrzej Cichocki,et al.  Hierarchical ALS Algorithms for Nonnegative Matrix and 3D Tensor Factorization , 2007, ICA.

[38]  Andrzej Cichocki,et al.  Non-Negative Tensor Factorization using Alpha and Beta Divergences , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[39]  Andrzej Cichocki,et al.  Novel Multi-layer Non-negative Tensor Factorization with Sparsity Constraints , 2007, ICANNGA.

[40]  S. Amari Integration of Stochastic Models by Minimizing -Divergence , 2007, Neural Computation.

[41]  Shun-ichi Amari,et al.  Information Geometry and Its Applications: Convex Function and Dually Flat Manifold , 2009, ETVC.

[42]  A. Cichocki,et al.  Nonnegative matrix factorization with -divergence , 2008 .

[43]  Imre Csiszár,et al.  Axiomatic Characterizations of Information Measures , 2008, Entropy.

[44]  Andrzej Cichocki,et al.  Non-negative matrix factorization with alpha-divergence , 2008, Pattern Recognit. Lett..

[45]  Shun-ichi Amari,et al.  $\alpha$ -Divergence Is Unique, Belonging to Both $f$-Divergence and Bregman Divergence Classes , 2009, IEEE Transactions on Information Theory.

[46]  Nancy Bertin,et al.  Nonnegative Matrix Factorization with the Itakura-Saito Divergence: With Application to Music Analysis , 2009, Neural Computation.

[47]  Frank Nielsen,et al.  Sided and Symmetrized Bregman Centroids , 2009, IEEE Transactions on Information Theory.

[48]  T. Yamano A generalization of the Kullback-Leibler divergence and its properties , 2009, 0902.1898.

[49]  M. Bertero,et al.  Nonnegative least-squares image deblurring: improved gradient projection approaches , 2010 .

[50]  H. Kameoka,et al.  Convergence-guaranteed multiplicative algorithms for nonnegative matrix factorization with β-divergence , 2010, 2010 IEEE International Workshop on Machine Learning for Signal Processing.

[51]  S. Amari,et al.  Information geometry of divergence functions , 2010 .

[52]  Andrzej Cichocki,et al.  Families of Alpha- Beta- and Gamma- Divergences: Flexible and Robust Measures of Similarities , 2010, Entropy.

[53]  P. Favati,et al.  Performance analysis of maximum likelihood methods for regularization problems with nonnegativity constraints , 2010 .

[54]  E. Vincent,et al.  Stability Analysis of Multiplicative Update Algorithms and Application to Nonnegative Matrix Factorization , 2010, IEEE Transactions on Neural Networks.

[55]  Thomas Villmann,et al.  Divergence Based Online Learning in Vector Quantization , 2010, ICAISC.

[56]  Jérôme Idier,et al.  Algorithms for Nonnegative Matrix Factorization with the β-Divergence , 2010, Neural Computation.

[57]  Thomas Villmann,et al.  Divergence-Based Vector Quantization , 2011, Neural Computation.

[58]  T. Villmann,et al.  Mathematical Aspects of Divergence Based Vector Quantization Using Fréchet-Derivatives – Extended and revised version – Report 01 / 2010 , 2022 .