Learning with matrix factorizations

Matrices that can be factored into a product of two simpler matrices can serve as a useful and often natural model in the analysis of tabulated or high-dimensional data. Models based on matrix factorization (Factor Analysis, PCA) have been extensively used in statistical analysis and machine learning for over a century, with many new formulations and models suggested in recent years (Latent Semantic Indexing, Aspect Models, Probabilistic PCA, Exponential PCA, Non-Negative Matrix Factorization and others). In this thesis we address several issues related to learning with matrix factorizations: we study the asymptotic behavior and generalization ability of existing methods, suggest new optimization methods, and present a novel maximum-margin high-dimensional matrix factorization formulation. (Copies available exclusively from MIT Libraries, Rm. 14-0551, Cambridge, MA 02139-4307. Ph. 617-253-5668; Fax 617-253-1690.)

[1]  G. Young Maximum likelihood estimation and factor analysis , 1941 .

[2]  Herman Rubin,et al.  Statistical Inference in Factor Analysis , 1956 .

[3]  Walter L. Smith Probability and Statistics , 1959, Nature.

[4]  R. Stephenson A and V , 1962, The British journal of ophthalmology.

[5]  H. Warren Lower bounds for approximation by nonlinear manifolds , 1968 .

[6]  D. F. Andrews,et al.  Scale Mixtures of Normal Distributions , 1974 .

[7]  M. Degroot,et al.  Probability and Statistics , 2021, Examining an Operational Approach to Teaching Probability.

[8]  Jim Lawrence,et al.  Oriented matroids , 1978, J. Comb. Theory, Ser. B.

[9]  P. McCullagh Regression Models for Ordinal Data , 1980 .

[10]  Vojtech Rödl,et al.  Geometrical realization of set systems and probabilistic communication complexity , 1985, 26th Annual Symposium on Foundations of Computer Science (sfcs 1985).

[11]  Richard Pollack,et al.  Upper bounds for configurations and polytopes inRd , 1986, Discret. Comput. Geom..

[12]  Janos Simon,et al.  Probabilistic Communication Complexity , 1986, J. Comput. Syst. Sci..

[13]  N. Alon The number of polytopes, configurations and real matroids , 1986 .

[14]  G. Jameson Summing and nuclear norms in Banach space theory , 1987 .

[15]  G. Stewart,et al.  Matrix Perturbation Theory , 1990 .

[16]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[17]  D. Shpak A weighted-least-squares matrix decomposition method with application to the design of two-dimensional digital filters , 1990, Proceedings of the 33rd Midwest Symposium on Circuits and Systems.

[18]  Charles R. Johnson,et al.  Topics in Matrix Analysis , 1991 .

[19]  Michael W. Berry,et al.  Large-Scale Sparse Singular Value Computations , 1992 .

[20]  David Haussler,et al.  Decision Theoretic Generalizations of the PAC Model for Neural Net and Other Learning Applications , 1992, Inf. Comput..

[21]  Shai Ben-David,et al.  Localization vs. Identification of Semi-Algebraic Sets , 1993, COLT '93.

[22]  Paul W. Goldberg,et al.  Bounding the Vapnik-Chervonenkis Dimension of Concept Classes Parameterized by Real Numbers , 1993, COLT '93.

[23]  David Haussler,et al.  Sphere Packing Numbers for Subsets of the Boolean n-Cube with Bounded Vapnik-Chervonenkis Dimension , 1995, J. Comb. Theory, Ser. A.

[24]  László Györfi,et al.  A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.

[25]  H. Sebastian Seung,et al.  Unsupervised Learning by Convex and Conic Coding , 1996, NIPS.

[26]  N. Alon Tools from higher algebra , 1996 .

[27]  Paul S. Wang,et al.  Weighted Low-Rank Approximation of General Complex Matrices and Its Application in the Design of 2-D Digital Filters , 1997 .

[28]  Michael I. Jordan,et al.  Unsupervised Learning from Dyadic Data , 1998 .

[29]  A. V. D. Vaart,et al.  Asymptotic Statistics: U -Statistics , 1998 .

[30]  Alan Thornton Gous,et al.  Exponential and spherical subfamily models , 1998 .

[31]  Michael J. Pazzani,et al.  Learning Collaborative Information Filters , 1998, ICML.

[32]  Martin J. Wainwright,et al.  Scale Mixtures of Gaussians and the Statistics of Natural Images , 1999, NIPS.

[33]  B. Borchers A C library for semidefinite programming , 1999 .

[34]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[35]  Peter L. Bartlett,et al.  Neural Network Learning - Theoretical Foundations , 1999 .

[36]  B. Borchers CSDP, A C library for semidefinite programming , 1999 .

[37]  Klaus Obermayer,et al.  Support vector learning for ordinal regression , 1999 .

[38]  Michael E. Tipping,et al.  Probabilistic Principal Component Analysis , 1999 .

[39]  P. Anandan,et al.  Factorization with Uncertainty , 2000, ECCV.

[40]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[41]  D. Botstein,et al.  Singular value decomposition for genome-wide expression data processing and modeling. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[42]  Yoav Seginer,et al.  The Expected Norm of Random Matrices , 2000, Combinatorics, Probability and Computing.

[43]  H. Sebastian Seung,et al.  Algorithms for Non-negative Matrix Factorization , 2000, NIPS.

[44]  Joshua B. Tenenbaum,et al.  Separating Style and Content with Bilinear Models , 2000, Neural Computation.

[45]  Ralf Herbrich,et al.  Large margin rank boundaries for ordinal regression , 2000 .

[46]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[47]  John Riedl,et al.  Application of Dimensionality Reduction in Recommender System - A Case Study , 2000 .

[48]  Alexander J. Smola,et al.  Advances in Large Margin Classifiers , 2000 .

[49]  C. Loan The ubiquitous Kronecker product , 2000 .

[50]  Michael I. Jordan,et al.  Bayesian parameter estimation via variational methods , 2000, Stat. Comput..

[51]  L. Lazzeroni Plaid models for gene expression data , 2000 .

[52]  Peter L. Bartlett,et al.  Rademacher and Gaussian Complexities: Risk Bounds and Structural Results , 2003, J. Mach. Learn. Res..

[53]  Nello Cristianini,et al.  On the Concentration of Spectral Properties , 2001, NIPS.

[54]  Sanjoy Dasgupta,et al.  A Generalization of Principal Components Analysis to the Exponential Family , 2001, NIPS.

[55]  Anna R. Karlin,et al.  Spectral analysis of data , 2001, STOC '01.

[56]  Stephen P. Boyd,et al.  A rank minimization heuristic with application to minimum order system approximation , 2001, Proceedings of the 2001 American Control Conference. (Cat. No.01CH37148).

[57]  Russ B. Altman,et al.  Missing value estimation methods for DNA microarrays , 2001, Bioinform..

[58]  Koby Crammer,et al.  Pranking with Ranking , 2001, NIPS.

[59]  D. G. Simpson,et al.  Conditional risk models for ordinal response data: simultaneous logistic regression analysis and generalized score tests , 2002 .

[60]  Wray L. Buntine Variational Extensions to EM and Multinomial PCA , 2002, ECML.

[61]  I. Jolliffe Principal Component Analysis , 2002 .

[62]  Geoffrey J. Gordon Generalized^2 Linear^2 Models , 2002, NIPS 2002.

[63]  Matthew Brand,et al.  Incremental Singular Value Decomposition of Uncertain Data with Missing Values , 2002, ECCV.

[64]  V. Koltchinskii,et al.  Empirical margin distributions and bounding the generalization error of combined classifiers , 2002, math/0405343.

[65]  Amnon Shashua,et al.  Ranking with Large Margin Principle: Two Approaches , 2002, NIPS.

[66]  A. Hanks Canada , 2002 .

[67]  Prabhakar Raghavan,et al.  Competitive recommendation systems , 2002, STOC '02.

[68]  Tommi S. Jaakkola,et al.  Linear Dependent Dimensionality Reduction , 2003, NIPS.

[69]  Geoffrey J. Gordon Generalized² Linear² Models , 2003, NIPS 2003.

[70]  Michael I. Jordan,et al.  Large Margin Classifiers: Convex Loss, Low Noise, and Convergence Rates , 2003, NIPS.

[71]  Lawrence K. Saul,et al.  A Generalized Linear Model for Principal Component Analysis of Binary Data , 2003, AISTATS.

[72]  Tommi S. Jaakkola,et al.  Weighted Low-Rank Approximations , 2003, ICML.

[73]  Benjamin M. Marlin,et al.  Modeling User Rating Profiles For Collaborative Filtering , 2003, NIPS.

[74]  Naftali Tishby,et al.  Sufficient Dimensionality Reduction , 2003, J. Mach. Learn. Res..

[75]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[76]  Yoram Singer,et al.  Log-Linear Models for Label Ranking , 2003, NIPS.

[77]  Nello Cristianini,et al.  Learning the Kernel Matrix with Semidefinite Programming , 2002, J. Mach. Learn. Res..

[78]  Thomas Hofmann,et al.  Latent semantic models for collaborative filtering , 2004, TOIS.

[79]  Benjamin M. Marlin,et al.  Collaborative Filtering: A Machine Learning Perspective , 2004 .

[80]  Philip M. Long,et al.  A Theoretical Analysis of Query Selection for Collaborative Filtering , 2001, Machine Learning.

[81]  Günter M. Ziegler,et al.  Oriented Matroids , 2017, Handbook of Discrete and Computational Geometry, 2nd Ed..

[82]  Kenneth Y. Goldberg,et al.  Eigentaste: A Constant Time Collaborative Filtering Algorithm , 2001, Information Retrieval.

[83]  Richard S. Zemel,et al.  The multiple multiplicative factor model for collaborative filtering , 2004, ICML.

[84]  Thomas Hofmann,et al.  Unsupervised Learning by Probabilistic Latent Semantic Analysis , 2004, Machine Learning.

[85]  John D. Barnett,et al.  Convex matrix factorization for gene expression analysis , 2004 .