Learning Determinantal Point Processes by Sampling Inferred Negatives

Determinantal Point Processes (DPPs) have attracted significant interest from the machine-learning community due to their ability to elegantly and tractably model the delicate balance between quality and diversity of sets. We consider learning DPPs from data, a key task for DPPs; for this task, we introduce a novel optimization problem, Contrastive Estimation (CE), which encodes information about "negative" samples into the basic learning model. CE is grounded in the successful use of negative information in machine-vision and language modeling. Depending on the chosen negative distribution (which may be static or evolve during optimization), CE assumes two different forms, which we analyze theoretically and experimentally. We evaluate our new model on real-world datasets; on a challenging dataset, CE learning delivers a considerable improvement in predictive performance over a DPP learned without using contrastive information.

[1]  Amin Karbasi,et al.  Fast Mixing for Discrete Point Processes , 2015, COLT.

[2]  Ben Taskar,et al.  Expectation-Maximization for Learning Determinantal Point Processes , 2014, NIPS.

[3]  J. Borcea,et al.  The Lee-Yang and Pólya-Schur programs. I. Linear operators preserving stability , 2008, 0809.0401.

[4]  Kah Kay Sung,et al.  Learning and example selection for object and pattern detection , 1995 .

[5]  Ulrich Paquet,et al.  Low-Rank Factorization of Determinantal Point Processes , 2017, AAAI.

[6]  Hui Lin,et al.  Learning Mixtures of Submodular Shells with Application to Document Summarization , 2012, UAI.

[7]  François Fleuret,et al.  Efficient Sample Mining for Object Detection , 2014, ACML.

[8]  Kun Guo,et al.  Data mining for the online retail industry: A case study of RFM model-based customer segmentation using data mining , 2012 .

[9]  Julius Borcea,et al.  The Lee‐Yang and Pólya‐Schur programs. II. Theory of stable polynomials and applications , 2008, 0809.3087.

[10]  Francis R. Bach,et al.  Learning Determinantal Point Processes in Sublinear Time , 2016, AISTATS.

[11]  Nima Anari,et al.  Monte Carlo Markov Chain Algorithms for Sampling Strongly Rayleigh Distributions and Determinantal Point Processes , 2016, COLT.

[12]  Andreas Krause,et al.  Near-Optimal Sensor Placements in Gaussian Processes: Theory, Efficient Algorithms and Empirical Studies , 2008, J. Mach. Learn. Res..

[13]  Andreas Krause,et al.  Variational Inference in Mixed Probabilistic Submodular Models , 2016, NIPS.

[14]  Ben Taskar,et al.  Learning the Parameters of Determinantal Point Process Kernels , 2014, ICML.

[15]  Abhinav Gupta,et al.  Training Region-Based Object Detectors with Online Hard Example Mining , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Weinan Zhang,et al.  Improving Negative Sampling for Word Representation using Self-embedded Features , 2017, WSDM.

[17]  Ulrich Paquet,et al.  Bayesian Low-Rank Determinantal Point Processes , 2016, RecSys.

[18]  Suvrit Sra,et al.  Fixed-point algorithms for learning determinantal point processes , 2015, ICML.

[19]  Ankur Moitra,et al.  Learning Determinantal Point Processes with Moments and Cycles , 2017, ICML.

[20]  Suvrit Sra,et al.  Fast DPP Sampling for Nystrom with Application to Kernel Methods , 2016, ICML.

[21]  Petter Brändén,et al.  Classification of hyperbolicity and stability preservers: the multivariate Weyl algebra case , 2006 .

[22]  Takanori Maehara,et al.  Dynamic Determinantal Point Processes , 2018, AAAI.

[23]  Heike Freud,et al.  On Line Learning In Neural Networks , 2016 .

[24]  Yifan Hu,et al.  Collaborative Filtering for Implicit Feedback Datasets , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[25]  Ian J. Goodfellow,et al.  On distinguishability criteria for estimating generative models , 2014, ICLR.

[26]  ChengXiang Zhai,et al.  Improving one-class collaborative filtering by incorporating rich user information , 2010, CIKM.

[27]  Huan Ling,et al.  Adversarial Contrastive Estimation , 2018, ACL.

[28]  Ben Taskar,et al.  Learning Determinantal Point Processes , 2011, UAI.

[29]  Julius Borcea,et al.  Multivariate Pólya–Schur classification problems in the Weyl algebra , 2006, math/0606360.

[30]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[31]  Alexei Borodin,et al.  Determinantal point processes , 2009, 0911.1153.

[32]  Yee Whye Teh,et al.  A fast and simple algorithm for training neural probabilistic language models , 2012, ICML.

[33]  Michael Biehl,et al.  On-line Learning in Neural Networks , 1998 .

[34]  Suvrit Sra,et al.  Kronecker Determinantal Point Processes , 2016, NIPS.

[35]  Noah A. Smith,et al.  Guiding Unsupervised Grammar Induction Using Contrastive Estimation , 2005 .

[36]  Noah A. Smith,et al.  Contrastive Estimation: Training Log-Linear Models on Unlabeled Data , 2005, ACL.

[37]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[38]  Hedvig Kjellström,et al.  Stochastic Learning on Imbalanced Data: Determinantal Point Processes for Mini-batch Diversification , 2017, ArXiv.

[39]  J. Møller,et al.  Determinantal point process models and statistical inference , 2012, 1205.4818.

[40]  T. Liggett,et al.  Negative dependence and the geometry of polynomials , 2007, 0707.2340.

[41]  Aapo Hyvärinen,et al.  Noise-Contrastive Estimation of Unnormalized Statistical Models, with Applications to Natural Image Statistics , 2012, J. Mach. Learn. Res..

[42]  Kristen Grauman,et al.  Large-Margin Determinantal Point Processes , 2014, UAI.

[43]  Geert Wets,et al.  Using association rules for product assortment decisions: a case study , 1999, KDD '99.

[44]  O. Macchi The coincidence approach to stochastic point processes , 1975, Advances in Applied Probability.

[45]  Andreas Krause,et al.  Learning Probabilistic Submodular Diversity Models Via Noise Contrastive Estimation , 2016, AISTATS.

[46]  David Saad,et al.  On-Line Learning in Neural Networks , 1999 .

[47]  Suvrit Sra,et al.  Diversity Networks , 2015, ICLR.

[48]  R. Pemantle Towards a theory of negative dependence , 2000, math/0404095.

[49]  Jennifer Gillenwater Approximate inference for determinantal point processes , 2014 .

[50]  Ben Taskar,et al.  Determinantal Point Processes for Machine Learning , 2012, Found. Trends Mach. Learn..