Binary Matrix Factorization with Applications

An interesting problem in nonnegative matrix factorization (NMF) is to factorize the matrix X which is of some specific class, for example, binary matrix. In this paper, we extend the standard NMF to binary matrix factorization (BMF for short): given a binary matrix X, we want to factorize X into two binary matrices W, H (thus conserving the most important integer property of the objective matrix X) satisfying X ap WH. Two algorithms are studied and compared. These methods rely on a fundamental boundedness property of NMF which we propose and prove. This new property also provides a natural normalization scheme that eliminates the bias of factor matrices. Experiments on both synthetic and real world datasets are conducted to show the competency and effectiveness of BMF.

[1]  Golub Gene H. Et.Al Matrix Computations, 3rd Edition , 2007 .

[2]  C. Ding,et al.  On the Equivalence of Nonnegative Matrix Factorization and K-means - Spectral Clustering , 2005 .

[3]  Chris H. Q. Ding,et al.  Orthogonal nonnegative matrix t-factorizations for clustering , 2006, KDD '06.

[4]  Yaniv Ziv,et al.  Revealing modular organization in the yeast transcriptional network , 2002, Nature Genetics.

[5]  P. Paatero,et al.  Positive matrix factorization: A non-negative factor model with optimal utilization of error estimates of data values† , 1994 .

[6]  Philip S. Yu,et al.  Fast algorithms for projected clustering , 1999, SIGMOD '99.

[7]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[8]  Philip S. Yu,et al.  Co-clustering by block value decomposition , 2005, KDD '05.

[9]  Roded Sharan,et al.  Discovering statistically significant biclusters in gene expression data , 2002, ISMB.

[10]  Tao Li,et al.  A general model for clustering binary data , 2005, KDD '05.

[11]  Naren Ramakrishnan,et al.  Nonorthogonal decomposition of binary matrices for bounded-error data compression and analysis , 2006, TOMS.

[12]  Pablo Tamayo,et al.  Metagenes and molecular pattern discovery using matrix factorization , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[13]  Lothar Thiele,et al.  A systematic comparison and evaluation of biclustering methods for gene expression data , 2006, Bioinform..

[14]  Takeo Kanade,et al.  Discriminative cluster analysis , 2006, ICML.

[15]  Efstratios Gallopoulos,et al.  CLSI: A Flexible Approximation Scheme from Clustered Term-Document Matrices , 2005, SDM.

[16]  S. Ramaswamy,et al.  Translation of microarray data into clinically relevant cancer diagnostic tests using gene expression ratios in lung cancer and mesothelioma. , 2002, Cancer research.

[17]  Tao Li,et al.  Document clustering via adaptive subspace iteration , 2004, SIGIR '04.

[18]  C. Eckart,et al.  The approximation of one matrix by another of lower rank , 1936 .

[19]  Daniel D. Lee,et al.  Multiplicative Updates for Nonnegative Quadratic Programming in Support Vector Machines , 2002, NIPS.

[20]  Ron Shamir,et al.  CLICK and EXPANDER: a system for clustering and visualizing gene expression data , 2003, Bioinform..

[21]  Sven Bergmann,et al.  Defining transcription modules using large-scale gene expression data , 2004, Bioinform..

[22]  Tao Li,et al.  The Relationships Among Various Nonnegative Matrix Factorization Methods for Clustering , 2006, Sixth International Conference on Data Mining (ICDM'06).

[23]  Yuan Gao,et al.  Improving molecular cancer class discovery through sparse non-negative matrix factorization , 2005 .

[24]  Pauli Miettinen,et al.  The Discrete Basis Problem , 2006, IEEE Transactions on Knowledge and Data Engineering.

[25]  Gene H. Golub,et al.  Matrix computations , 1983 .