Nonnegative Matrix Factorization and Probabilistic Latent Semantic Indexing: Equivalence Chi-Square Statistic, and a Hybrid Method

Non-negative Matrix Factorization (NMF) and Probabilistic Latent Semantic Indexing (PLSI) have been successfully applied to document clustering recently. In this paper, we show that PLSI and NMF optimize the same objective function, although PLSI and NMF are different algorithms as verified by experiments. This provides a theoretical basis for a new hybrid method that runs PLSI and NMF alternatively, each jumping out of local minima of the other method successively, thus achieving better final solution. Extensive experiments on 5 real-life datasets show relations between NMF and PLSI, and indicate the hybrid method lead to significant improvements over NMF-only or PLSI-only methods. We also show that at first order approximation, NMF is identical to χ2-statistic.