From Group to Individual Labels Using Deep Features

In many classification problems labels are relatively scarce. One context in which this occurs is where we have labels for groups of instances but not for the instances themselves, as in multi-instance learning. Past work on this problem has typically focused on learning classifiers to make predictions at the group level. In this paper we focus on the problem of learning classifiers to make predictions at the instance level. To achieve this we propose a new objective function that encourages smoothness of inferred instance-level labels based on instance-level similarity, while at the same time respecting group-level label constraints. We apply this approach to the problem of predicting labels for sentences given labels for reviews, using a convolutional neural network to infer sentence similarity. The approach is evaluated using three large review data sets from IMDB, Yelp, and Amazon, and we demonstrate the proposed approach is both accurate and scalable compared to various alternatives.

[1]  Christopher Potts,et al.  Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank , 2013, EMNLP.

[2]  Quoc V. Le,et al.  Distributed Representations of Sentences and Documents , 2014, ICML.

[3]  Alexander Zien,et al.  Semi-Supervised Learning , 2006 .

[4]  Guoqing Liu,et al.  Key Instance Detection in Multi-Instance Learning , 2012, ACML.

[5]  Marco Loog,et al.  On classification with bags, groups and sets , 2014, Pattern Recognit. Lett..

[6]  Thomas Gärtner,et al.  Multi-Instance Kernels , 2002, ICML.

[7]  James R. Foulds,et al.  A review of multi-instance learning assumptions , 2010, The Knowledge Engineering Review.

[8]  Alexander Zien,et al.  Semi-Supervised Classification by Low Density Separation , 2005, AISTATS.

[9]  Xin Xu,et al.  Logistic Regression and Boosting for Labeled Bags of Instances , 2004, PAKDD.

[10]  Xiu-Shen Wei,et al.  Scalable Multi-instance Learning , 2014, 2014 IEEE International Conference on Data Mining.

[11]  Thomas Hofmann,et al.  Support Vector Machines for Multiple-Instance Learning , 2002, NIPS.

[12]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[13]  Geoffrey E. Hinton Learning distributed representations of concepts. , 1989 .

[14]  Zhi-Hua Zhou,et al.  Towards Discovering What Patterns Trigger What Labels , 2012, AAAI.

[15]  Bernhard Pfahringer,et al.  A Two-Level Learning Method for Generalized Multi-instance Problems , 2003, ECML.

[16]  Ivor W. Tsang,et al.  A Convex Method for Locating Regions of Interest with Multi-instance Learning , 2009, ECML/PKDD.

[17]  Nando de Freitas,et al.  A Constrained Semi-supervised Learning Approach to Data Association , 2004, ECCV.

[18]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[19]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[20]  Oded Maron,et al.  Multiple-Instance Learning for Natural Scene Classification , 1998, ICML.

[21]  Melih Kandemir,et al.  Instance Label Prediction by Dirichlet Process Multiple Instance Learning , 2014, UAI.

[22]  Thomas G. Dietterich,et al.  Solving the Multiple Instance Problem with Axis-Parallel Rectangles , 1997, Artif. Intell..

[23]  Jason Weston,et al.  Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[24]  Misha Denil,et al.  Extraction of Salient Sentences from Labelled Documents , 2014, ArXiv.

[25]  Jure Leskovec,et al.  Hidden factors and hidden topics: understanding rating dimensions with review text , 2013, RecSys.

[26]  Alexander J. Smola,et al.  Estimating Labels from Label Proportions , 2009, J. Mach. Learn. Res..

[27]  Zhi-Hua Zhou,et al.  On the relation between multi-instance learning and semi-supervised learning , 2007, ICML '07.

[28]  Ping Li,et al.  Asymmetric LSH (ALSH) for Sublinear Time Maximum Inner Product Search (MIPS) , 2014, NIPS.

[29]  Andrei Popescu-Belis,et al.  Explaining the Stars: Weighted Multiple-Instance Learning for Aspect-Based Sentiment Analysis , 2014, EMNLP.

[30]  Daniel Kifer,et al.  Attacks on privacy and deFinetti's theorem , 2009, SIGMOD Conference.

[31]  Nando de Freitas,et al.  Learning about Individuals from Group Statistics , 2005, UAI.

[32]  Dong Liu,et al.  $\propto$SVM for learning with label proportions , 2013, ICML 2013.

[33]  Christopher Potts,et al.  Learning Word Vectors for Sentiment Analysis , 2011, ACL.

[34]  Lucila Ohno-Machado,et al.  Natural language processing: an introduction , 2011, J. Am. Medical Informatics Assoc..

[35]  Richard Nock,et al.  (Almost) No Label No Cry , 2014, NIPS.

[36]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[37]  Zhi-Hua Zhou,et al.  Multi-instance learning by treating instances as non-I.I.D. samples , 2008, ICML '09.

[38]  Razvan C. Bunescu,et al.  Multiple instance learning for sparse positive bags , 2007, ICML '07.