Efficient Sampling for k-Determinantal Point Processes

Determinantal Point Processes (DPPs) are elegant probabilistic models of repulsion and diversity over discrete sets of items. But their applicability to large sets is hindered by expensive cubic-complexity matrix operations for basic tasks such as sampling. In light of this, we propose a new method for approximate sampling from discrete $k$-DPPs. Our method takes advantage of the diversity property of subsets sampled from a DPP, and proceeds in two stages: first it constructs coresets for the ground set of items; thereafter, it efficiently samples subsets based on the constructed coresets. As opposed to previous approaches, our algorithm aims to minimize the total variation distance to the original distribution. Experiments on both synthetic and real datasets indicate that our sampling algorithm works efficiently on large data sets, and yields more accurate samples than previous approaches.

[1]  Ben Taskar,et al.  k-DPPs: Fixed-Size Determinantal Point Processes , 2011, ICML.

[2]  Gene H. Golub,et al.  Some modified matrix eigenvalue problems , 1973, Milestones in Matrix Computation.

[3]  Ameet Talwalkar,et al.  On the Impact of Kernel Approximation on Learning Accuracy , 2010, AISTATS.

[4]  Ben Taskar,et al.  Determinantal Point Processes for Machine Learning , 2012, Found. Trends Mach. Learn..

[5]  Dan Feldman,et al.  Turning big data into tiny data: Constant-size coresets for k-means, PCA and projective clustering , 2013, SODA.

[6]  Andreas Frommer Numerical challenges in lattice quantum chromodynamics : joint interdisciplinary workshop of John von Neumann Institute for Computing, Jülich, and Institute of Applied Computer Science, Wuppertal University, August 1999 , 2000 .

[7]  Jitendra Malik,et al.  Spectral grouping using the Nystrom method , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Ben Taskar,et al.  Approximate Inference in Continuous Determinantal Processes , 2013, NIPS.

[9]  Michael Clausen,et al.  Algebraic complexity theory , 1997, Grundlehren der mathematischen Wissenschaften.

[10]  E. George,et al.  Determinantal Priors for Variable Selection , 2014 .

[11]  E. Rains,et al.  Eynard–Mehta Theorem, Schur Process, and their Pfaffian Analogs , 2004, math-ph/0409059.

[12]  Venkatesan Guruswami,et al.  Optimal column-based low-rank matrix reconstruction , 2011, SODA.

[13]  Mohamed-Ali Belabbas,et al.  Spectral methods in machine learning and new strategies for very large datasets , 2009, Proceedings of the National Academy of Sciences.

[14]  Alex Kulesza,et al.  Diversifying Sparsity Using Variational Determinantal Point Processes , 2014, ArXiv.

[15]  Laurent Decreusefond,et al.  Perfect Simulation of Determinantal Point Processes , 2013, 1311.1027.

[16]  Michel Minoux,et al.  Accelerated greedy algorithms for maximizing submodular set functions , 1978 .

[17]  Lida Xu,et al.  The internet of things: a survey , 2014, Information Systems Frontiers.

[18]  Yousef Saad,et al.  Rational approximation to the Fermi–Dirac function with applications in density functional theory , 2011, Numerical Algorithms.

[19]  Sergei Vassilvitskii,et al.  k-means++: the advantages of careful seeding , 2007, SODA '07.

[20]  Gérard Meurant The computation of bounds for the norm of the error in the conjugate gradient algorithm , 2004, Numerical Algorithms.

[21]  Petros Drineas,et al.  On the Nyström Method for Approximating a Gram Matrix for Improved Kernel-Based Learning , 2005, J. Mach. Learn. Res..

[22]  Michael W. Mahoney,et al.  Revisiting the Nystrom Method for Improved Large-scale Machine Learning , 2013, J. Mach. Learn. Res..

[23]  Jasper Snoek,et al.  A Determinantal Point Process Latent Variable Model for Inhibition in Neural Spiking Data , 2013, NIPS.

[24]  Christos Boutsidis,et al.  An improved approximation algorithm for the column subset selection problem , 2008, SODA.

[25]  Zhihua Zhang,et al.  Using The Matrix Ridge Approximation to Speedup Determinantal Point Processes Sampling Algorithms , 2014, AAAI.

[26]  Zoubin Ghahramani,et al.  Determinantal Clustering Processes - A Nonparametric Bayesian Approach to Kernel Based Semi-Supervised Clustering , 2013, UAI.

[27]  M. Plummer,et al.  CODA: convergence diagnosis and output analysis for MCMC , 2006 .

[28]  Nicolas Privault,et al.  Determinantal Point Processes , 2016 .

[29]  Zoubin Ghahramani,et al.  Determinantal clustering process - a nonparametric Bayesian approach to kernel based semi-supervised clustering , 2013, UAI 2013.

[30]  Petros Drineas,et al.  Fast Monte Carlo Algorithms for Matrices II: Computing a Low-Rank Approximation to a Matrix , 2006, SIAM J. Comput..

[31]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[32]  Ben Taskar,et al.  Structured Determinantal Point Processes , 2010, NIPS.

[33]  Hao Shen,et al.  Fast Kernel-Based Independent Component Analysis , 2009, IEEE Transactions on Signal Processing.

[34]  Ameet Talwalkar,et al.  Large-scale SVD and manifold learning , 2013, J. Mach. Learn. Res..

[35]  Ben Taskar,et al.  Expectation-Maximization for Learning Determinantal Point Processes , 2014, NIPS.

[36]  Devavrat Shah,et al.  Solving Systems of Linear Equations: Locally and Asynchronously , 2014, ArXiv.

[37]  G. Golub,et al.  Bounds for the Entries of Matrix Functions with Applications to Preconditioning , 1999 .

[38]  Francis R. Bach,et al.  Sharp analysis of low-rank kernel matrix approximations , 2012, COLT.

[39]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[40]  J. Freericks Transport in multilayered nanostructures , 2016 .

[41]  Avner Magen,et al.  Near Optimal Dimensionality Reductions That Preserve Volumes , 2008, APPROX-RANDOM.

[42]  Amos Fiat,et al.  Coresets forWeighted Facilities and Their Applications , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[43]  A. Wathen,et al.  APPROXIMATION OF THE SCATTERING AMPLITUDE AND LINEAR SYSTEMS , 2008 .

[44]  Yu Cheng,et al.  Scalable Parallel Factorizations of SDD Matrices and Efficient Sampling for Gaussian Graphical Models , 2014, ArXiv.

[45]  D. Aldous Some Inequalities for Reversible Markov Chains , 1982 .

[46]  Sariel Har-Peled,et al.  On coresets for k-means and k-median clustering , 2004, STOC '04.

[47]  Jure Leskovec,et al.  Statistical properties of community structure in large social and information networks , 2008, WWW.

[48]  Christopher C. Paige,et al.  The computation of eigenvalues and eigenvectors of very large sparse matrices , 1971 .

[49]  B. Parlett,et al.  The Lanczos algorithm with selective orthogonalization , 1979 .

[50]  Matthias W. Seeger,et al.  Using the Nyström Method to Speed Up Kernel Machines , 2000, NIPS.

[51]  Lexing Ying,et al.  SelInv---An Algorithm for Selected Inversion of a Sparse Symmetric Matrix , 2011, TOMS.

[52]  Alex Kulesza,et al.  Markov Determinantal Point Processes , 2012, UAI.

[53]  Alkis Gotovos,et al.  Sampling from Probabilistic Submodular Models , 2015, NIPS.

[54]  Andreas Krause,et al.  Lazier Than Lazy Greedy , 2014, AAAI.

[55]  Michele Benzi,et al.  A Sparse Approximate Inverse Preconditioner for the Conjugate Gradient Method , 1996, SIAM J. Sci. Comput..

[56]  Amin Karbasi,et al.  Fast Mixing for Discrete Point Processes , 2015, COLT.

[57]  Ben Taskar,et al.  Learning Determinantal Point Processes , 2011, UAI.

[58]  Martin E. Dyer,et al.  Path coupling: A technique for proving rapid mixing in Markov chains , 1997, Proceedings 38th Annual Symposium on Foundations of Computer Science.

[59]  G. Golub,et al.  Some large-scale matrix computation problems , 1996 .

[60]  Ben Taskar,et al.  Nystrom Approximation for Large-Scale Determinantal Processes , 2013, AISTATS.

[61]  E Weinan,et al.  A Fast Parallel Algorithm for Selected Inversion of Structured Sparse Matrices with Application to 2D Electronic Structure Calculations , 2010, SIAM J. Sci. Comput..

[62]  Constantine Bekas,et al.  Low cost high performance uncertainty quantification , 2009, WHPCF '09.

[63]  Y. Peres,et al.  Determinantal Processes and Independence , 2005, math/0503110.

[64]  O. Macchi The coincidence approach to stochastic point processes , 1975, Advances in Applied Probability.

[65]  Byungkon Kang,et al.  Fast Determinantal Point Process Sampling with Application to Clustering , 2013, NIPS.

[66]  Carl Friedrich Gauss METHODUS NOVA INTEGRALIUM VALORES PER APPROXIMATIONEM INVENIENDI , 2011 .

[67]  Volker Strassen,et al.  Algebraic Complexity Theory , 1991, Handbook of Theoretical Computer Science, Volume A: Algorithms and Complexity.

[68]  Katya Scheinberg,et al.  Efficient SVM Training Using Low-Rank Kernel Representations , 2002, J. Mach. Learn. Res..

[69]  Patrick J. Wolfe,et al.  On landmark selection and sampling in high-dimensional data analysis , 2009, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[70]  W. Gautschi A Survey of Gauss-Christoffel Quadrature Formulae , 1981 .

[71]  Saurabh Paul Core-Sets For Canonical Correlation Analysis , 2015, CIKM.

[72]  G. Micula,et al.  Numerical Treatment of the Integral Equations , 1999 .

[73]  ZhuJiang,et al.  A review of Nyström methods for large-scale machine learning , 2015 .

[74]  Gérard Meurant,et al.  Numerical experiments in computing bounds for the norm of the error in the preconditioned conjugate gradient algorithm , 1999, Numerical Algorithms.

[75]  Michele Benzi,et al.  Total communicability as a centrality measure , 2013, J. Complex Networks.

[76]  Martin E. Dyer,et al.  A more rapidly mixing Markov chain for graph colorings , 1998, Random Struct. Algorithms.

[77]  Nima Anari,et al.  Monte Carlo Markov Chain Algorithms for Sampling Strongly Rayleigh Distributions and Determinantal Point Processes , 2016, COLT.

[78]  Andreas Krause,et al.  Near-Optimal Sensor Placements in Gaussian Processes: Theory, Efficient Algorithms and Empirical Studies , 2008, J. Mach. Learn. Res..

[79]  Jan Vondrák,et al.  Symmetry and Approximability of Submodular Maximization Problems , 2009, 2009 50th Annual IEEE Symposium on Foundations of Computer Science.

[80]  Suvrit Sra,et al.  Fast DPP Sampling for Nystrom with Application to Kernel Methods , 2016, ICML.

[81]  J. Shewchuk An Introduction to the Conjugate Gradient Method Without the Agonizing Pain , 1994 .

[82]  Jan Vondrák,et al.  Optimal approximation for submodular and supermodular optimization with bounded curvature , 2013, SODA.

[83]  Ameet Talwalkar,et al.  Large-scale manifold learning , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[84]  Lothar Reichel,et al.  Network Analysis via Partial Spectral Factorization and Gauss Quadrature , 2013, SIAM J. Sci. Comput..

[85]  Santosh S. Vempala,et al.  Matrix approximation and projective clustering via volume sampling , 2006, SODA '06.

[86]  Michael I. Jordan,et al.  Kernel independent component analysis , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[87]  G. Golub,et al.  Matrices, Moments and Quadrature with Applications , 2009 .

[88]  Hui Lin,et al.  A Class of Submodular Functions for Document Summarization , 2011, ACL.

[89]  Michael W. Mahoney,et al.  Fast Randomized Kernel Methods With Statistical Guarantees , 2014, ArXiv.

[90]  Ryan P. Adams,et al.  Priors for Diversity in Generative Latent Variable Models , 2012, NIPS.

[91]  M. Hestenes,et al.  Methods of conjugate gradients for solving linear systems , 1952 .

[92]  M. L. Fisher,et al.  An analysis of approximations for maximizing submodular set functions—I , 1978, Math. Program..

[93]  Bernhard Schölkopf,et al.  Sparse Greedy Matrix Approximation for Machine Learning , 2000, International Conference on Machine Learning.

[94]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[95]  Desmond J. Higham,et al.  Network Properties Revealed through Matrix Functions , 2010, SIAM Rev..

[96]  Sariel Har-Peled,et al.  Smaller Coresets for k-Median and k-Means Clustering , 2005, SCG.

[97]  Luis Rademacher,et al.  Efficient Volume Sampling for Row/Column Subset Selection , 2010, 2010 IEEE 51st Annual Symposium on Foundations of Computer Science.

[98]  P. Bonacich Power and Centrality: A Family of Measures , 1987, American Journal of Sociology.

[99]  W. Wasow A note on the inversion of matrices by random walks , 1952 .

[100]  Ben Taskar,et al.  Discovering Diverse and Salient Threads in Document Collections , 2012, EMNLP.

[101]  Ben Taskar,et al.  Learning the Parameters of Determinantal Point Process Kernels , 2014, ICML.

[102]  G. Meurant The Lanczos and Conjugate Gradient Algorithms: From Theory to Finite Precision Computations , 2006 .

[103]  Ben Taskar,et al.  Near-Optimal MAP Inference for Determinantal Point Processes , 2012, NIPS.

[104]  C. Lanczos An iteration method for the solution of the eigenvalue problem of linear differential and integral operators , 1950 .

[105]  Kristen Grauman,et al.  Diverse Sequential Subset Selection for Supervised Video Summarization , 2014, NIPS.

[106]  Yi-Cheng Zhang,et al.  Solving the apparent diversity-accuracy dilemma of recommender systems , 2008, Proceedings of the National Academy of Sciences.

[107]  Gene H. Golub,et al.  Calculation of Gauss quadrature rules , 1967, Milestones in Matrix Computation.

[108]  Krzysztof Choromanski,et al.  Notes on using Determinantal Point Processes for Clustering with Applications to Text Clustering , 2014, ArXiv.

[109]  G. Golub,et al.  Matrices, moments and quadrature II; How to compute the norm of the error in iterative methods , 1997 .

[110]  John W. Fisher,et al.  Coresets for k-Segmentation of Streaming Data , 2014, NIPS.

[111]  J. Sherman,et al.  Adjustment of an Inverse Matrix Corresponding to a Change in One Element of a Given Matrix , 1950 .

[112]  R. Lobatto Lessen over de differentiaal- en integraal-rekening , 1851 .

[113]  Dan Feldman Coresets for Weighted Facilities and Their Applications , 2006 .

[114]  Joseph Naor,et al.  A Tight Linear Time (1/2)-Approximation for Unconstrained Submodular Maximization , 2012, 2012 IEEE 53rd Annual Symposium on Foundations of Computer Science.

[115]  Suvrit Sra,et al.  Bounds on bilinear inverse forms via Gaussian quadrature with applications , 2015, ArXiv.

[116]  D. Rubin,et al.  Inference from Iterative Simulation Using Multiple Sequences , 1992 .

[117]  Michael I. Jordan,et al.  Predictive low-rank decomposition for kernel methods , 2005, ICML.

[118]  T. Broadbent Mathematics for the Physical Sciences , 1959, Nature.

[119]  Tom A. B. Snijders,et al.  Social Network Analysis , 2011, International Encyclopedia of Statistical Science.

[120]  Andreas Krause,et al.  Coresets for Nonparametric Estimation - the Case of DP-Means , 2015, ICML.

[121]  Benjamin Recht,et al.  Random Features for Large-Scale Kernel Machines , 2007, NIPS.

[122]  Kyriakos Kalorkoti ALGEBRAIC COMPLEXITY THEORY (Grundlehren der Mathematischen Wissenschaften 315) , 1999 .