论文信息 - Learning with matrix factorizations - 字舞流文

Learning with matrix factorizations

Matrices that can be factored into a product of two simpler matrices can serve as a useful and often natural model in the analysis of tabulated or high-dimensional data. Models based on matrix factorization (Factor Analysis, PCA) have been extensively used in statistical analysis and machine learning for over a century, with many new formulations and models suggested in recent years (Latent Semantic Indexing, Aspect Models, Probabilistic PCA, Exponential PCA, Non-Negative Matrix Factorization and others). In this thesis we address several issues related to learning with matrix factorizations: we study the asymptotic behavior and generalization ability of existing methods, suggest new optimization methods, and present a novel maximum-margin high-dimensional matrix factorization formulation. (Copies available exclusively from MIT Libraries, Rm. 14-0551, Cambridge, MA 02139-4307. Ph. 617-253-5668; Fax 617-253-1690.)

Nathan Srebro | Nathan Srebro | N. Srebro

[1] G. Young. Maximum likelihood estimation and factor analysis , 1941 .

[2] Herman Rubin,et al. Statistical Inference in Factor Analysis , 1956 .

[3] Walter L. Smith. Probability and Statistics , 1959, Nature.

[4] R. Stephenson. A and V , 1962, The British journal of ophthalmology.

[5] H. Warren. Lower bounds for approximation by nonlinear manifolds , 1968 .

[6] D. F. Andrews,et al. Scale Mixtures of Normal Distributions , 1974 .

[7] M. Degroot,et al. Probability and Statistics , 2021, Examining an Operational Approach to Teaching Probability.

[8] Jim Lawrence,et al. Oriented matroids , 1978, J. Comb. Theory, Ser. B.

[9] P. McCullagh. Regression Models for Ordinal Data , 1980 .

[10] Vojtech Rödl,et al. Geometrical realization of set systems and probabilistic communication complexity , 1985, 26th Annual Symposium on Foundations of Computer Science (sfcs 1985).

[11] Richard Pollack,et al. Upper bounds for configurations and polytopes inRd , 1986, Discret. Comput. Geom..

[12] Janos Simon,et al. Probabilistic Communication Complexity , 1986, J. Comput. Syst. Sci..

[13] N. Alon. The number of polytopes, configurations and real matroids , 1986 .

[14] G. Jameson. Summing and nuclear norms in Banach space theory , 1987 .

[15] G. Stewart,et al. Matrix Perturbation Theory , 1990 .

[16] T. Landauer,et al. Indexing by Latent Semantic Analysis , 1990 .

[17] D. Shpak. A weighted-least-squares matrix decomposition method with application to the design of two-dimensional digital filters , 1990, Proceedings of the 33rd Midwest Symposium on Circuits and Systems.

[18] Charles R. Johnson,et al. Topics in Matrix Analysis , 1991 .

[19] Michael W. Berry,et al. Large-Scale Sparse Singular Value Computations , 1992 .

[20] David Haussler,et al. Decision Theoretic Generalizations of the PAC Model for Neural Net and Other Learning Applications , 1992, Inf. Comput..

[21] Shai Ben-David,et al. Localization vs. Identification of Semi-Algebraic Sets , 1993, COLT '93.

[22] Paul W. Goldberg,et al. Bounding the Vapnik-Chervonenkis Dimension of Concept Classes Parameterized by Real Numbers , 1993, COLT '93.

[23] David Haussler,et al. Sphere Packing Numbers for Subsets of the Boolean n-Cube with Bounded Vapnik-Chervonenkis Dimension , 1995, J. Comb. Theory, Ser. A.

[24] László Györfi,et al. A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.

[25] H. Sebastian Seung,et al. Unsupervised Learning by Convex and Conic Coding , 1996, NIPS.

[26] N. Alon. Tools from higher algebra , 1996 .

[27] Paul S. Wang,et al. Weighted Low-Rank Approximation of General Complex Matrices and Its Application in the Design of 2-D Digital Filters , 1997 .

[28] Michael I. Jordan,et al. Unsupervised Learning from Dyadic Data , 1998 .

[29] A. V. D. Vaart,et al. Asymptotic Statistics: U -Statistics , 1998 .

[30] Alan Thornton Gous,et al. Exponential and spherical subfamily models , 1998 .

[31] Michael J. Pazzani,et al. Learning Collaborative Information Filters , 1998, ICML.

[32] Martin J. Wainwright,et al. Scale Mixtures of Gaussians and the Statistics of Natural Images , 1999, NIPS.

[33] B. Borchers. A C library for semidefinite programming , 1999 .

[34] H. Sebastian Seung,et al. Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[35] Peter L. Bartlett,et al. Neural Network Learning - Theoretical Foundations , 1999 .

[36] B. Borchers. CSDP, A C library for semidefinite programming , 1999 .

[37] Klaus Obermayer,et al. Support vector learning for ordinal regression , 1999 .

[38] Michael E. Tipping,et al. Probabilistic Principal Component Analysis , 1999 .

[39] P. Anandan,et al. Factorization with Uncertainty , 2000, ECCV.

[40] J. Tenenbaum,et al. A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[41] D. Botstein,et al. Singular value decomposition for genome-wide expression data processing and modeling. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[42] Yoav Seginer,et al. The Expected Norm of Random Matrices , 2000, Combinatorics, Probability and Computing.

[43] H. Sebastian Seung,et al. Algorithms for Non-negative Matrix Factorization , 2000, NIPS.

[44] Joshua B. Tenenbaum,et al. Separating Style and Content with Bilinear Models , 2000, Neural Computation.

[45] Ralf Herbrich,et al. Large margin rank boundaries for ordinal regression , 2000 .

[46] S T Roweis,et al. Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[47] John Riedl,et al. Application of Dimensionality Reduction in Recommender System - A Case Study , 2000 .

[48] Alexander J. Smola,et al. Advances in Large Margin Classifiers , 2000 .

[49] C. Loan. The ubiquitous Kronecker product , 2000 .

[50] Michael I. Jordan,et al. Bayesian parameter estimation via variational methods , 2000, Stat. Comput..

[51] L. Lazzeroni. Plaid models for gene expression data , 2000 .

[52] Peter L. Bartlett,et al. Rademacher and Gaussian Complexities: Risk Bounds and Structural Results , 2003, J. Mach. Learn. Res..

[53] Nello Cristianini,et al. On the Concentration of Spectral Properties , 2001, NIPS.

[54] Sanjoy Dasgupta,et al. A Generalization of Principal Components Analysis to the Exponential Family , 2001, NIPS.

[55] Anna R. Karlin,et al. Spectral analysis of data , 2001, STOC '01.

[56] Stephen P. Boyd,et al. A rank minimization heuristic with application to minimum order system approximation , 2001, Proceedings of the 2001 American Control Conference. (Cat. No.01CH37148).

[57] Russ B. Altman,et al. Missing value estimation methods for DNA microarrays , 2001, Bioinform..

[58] Koby Crammer,et al. Pranking with Ranking , 2001, NIPS.

[59] D. G. Simpson,et al. Conditional risk models for ordinal response data: simultaneous logistic regression analysis and generalized score tests , 2002 .

[60] Wray L. Buntine. Variational Extensions to EM and Multinomial PCA , 2002, ECML.

[61] I. Jolliffe. Principal Component Analysis , 2002 .

[62] Geoffrey J. Gordon. Generalized^2 Linear^2 Models , 2002, NIPS 2002.

[63] Matthew Brand,et al. Incremental Singular Value Decomposition of Uncertain Data with Missing Values , 2002, ECCV.

[64] V. Koltchinskii,et al. Empirical margin distributions and bounding the generalization error of combined classifiers , 2002, math/0405343.

[65] Amnon Shashua,et al. Ranking with Large Margin Principle: Two Approaches , 2002, NIPS.

[66] A. Hanks. Canada , 2002 .

[67] Prabhakar Raghavan,et al. Competitive recommendation systems , 2002, STOC '02.

[68] Tommi S. Jaakkola,et al. Linear Dependent Dimensionality Reduction , 2003, NIPS.

[69] Geoffrey J. Gordon. Generalized² Linear² Models , 2003, NIPS 2003.

[70] Michael I. Jordan,et al. Large Margin Classifiers: Convex Loss, Low Noise, and Convergence Rates , 2003, NIPS.

[71] Lawrence K. Saul,et al. A Generalized Linear Model for Principal Component Analysis of Binary Data , 2003, AISTATS.

[72] Tommi S. Jaakkola,et al. Weighted Low-Rank Approximations , 2003, ICML.

[73] Benjamin M. Marlin,et al. Modeling User Rating Profiles For Collaborative Filtering , 2003, NIPS.

[74] Naftali Tishby,et al. Sufficient Dimensionality Reduction , 2003, J. Mach. Learn. Res..

[75] Michael I. Jordan,et al. Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[76] Yoram Singer,et al. Log-Linear Models for Label Ranking , 2003, NIPS.

[77] Nello Cristianini,et al. Learning the Kernel Matrix with Semidefinite Programming , 2002, J. Mach. Learn. Res..

[78] Thomas Hofmann,et al. Latent semantic models for collaborative filtering , 2004, TOIS.

[79] Benjamin M. Marlin,et al. Collaborative Filtering: A Machine Learning Perspective , 2004 .

[80] Philip M. Long,et al. A Theoretical Analysis of Query Selection for Collaborative Filtering , 2001, Machine Learning.

[81] Günter M. Ziegler,et al. Oriented Matroids , 2017, Handbook of Discrete and Computational Geometry, 2nd Ed..

[82] Kenneth Y. Goldberg,et al. Eigentaste: A Constant Time Collaborative Filtering Algorithm , 2001, Information Retrieval.

[83] Richard S. Zemel,et al. The multiple multiplicative factor model for collaborative filtering , 2004, ICML.

[84] Thomas Hofmann,et al. Unsupervised Learning by Probabilistic Latent Semantic Analysis , 2004, Machine Learning.

[85] John D. Barnett,et al. Convex matrix factorization for gene expression analysis , 2004 .