SGM: Sequence Generation Model for Multi-label Classification

Multi-label classification is an important yet challenging task in natural language processing. It is more complex than single-label classification in that the labels tend to be correlated. Existing methods tend to ignore the correlations between labels. Besides, different parts of the text can contribute differently for predicting different labels, which is not considered by existing models. In this paper, we propose to view the multi-label classification task as a sequence generation problem, and apply a sequence generation model with a novel decoder structure to solve it. Extensive experimental results show that our proposed methods outperform previous work by a substantial margin. Further analysis of experimental results demonstrates that the proposed methods not only capture the correlations between labels, but also select the most informative words automatically when predicting different labels.

[1]  Eyke Hüllermeier,et al.  Multilabel classification via calibrated label ranking , 2008, Machine Learning.

[2]  Christopher D. Manning,et al.  Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.

[3]  Jiebo Luo,et al.  Learning multi-label scene classification , 2004, Pattern Recognit..

[4]  Zhenchang Xing,et al.  Ensemble application of convolutional and recurrent neural networks for multi-label text categorization , 2017, 2017 International Joint Conference on Neural Networks (IJCNN).

[5]  Grigorios Tsoumakas,et al.  Multi-Label Classification: An Overview , 2007, Int. J. Data Warehous. Min..

[6]  Jürgen Schmidhuber,et al.  Highway Networks , 2015, ArXiv.

[7]  Andrew McCallum,et al.  Collective multi-label classification , 2005, CIKM '05.

[8]  Amanda Clare,et al.  Knowledge Discovery in Multi-label Phenotype Data , 2001, PKDD.

[9]  Ye Zhang,et al.  A Sensitivity Analysis of (and Practitioners’ Guide to) Convolutional Neural Networks for Sentence Classification , 2015, IJCNLP.

[10]  Geoff Holmes,et al.  Classifier chains for multi-label classification , 2009, Machine Learning.

[11]  Yoram Singer,et al.  BoosTexter: A Boosting-based System for Text Categorization , 2000, Machine Learning.

[12]  Piotr Szymanski,et al.  A scikit-based Python environment for performing multi-label classification , 2017, ArXiv.

[13]  Grigorios Tsoumakas,et al.  Multilabel Text Classification for Automated Tag Suggestion , 2008 .

[14]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[15]  Razvan Pascanu,et al.  On the difficulty of training recurrent neural networks , 2012, ICML.

[16]  Grigorios Tsoumakas,et al.  Random K-labelsets for Multilabel Classification , 2022 .

[17]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[18]  Bowen Zhou,et al.  Improved Neural Network-based Multi-label Classification with Better Initialization Leveraging Label Co-occurrence , 2016, NAACL.

[19]  Alexander M. Rush,et al.  Sequence-to-Sequence Learning as Beam-Search Optimization , 2016, EMNLP.

[20]  Zhi-Hua Zhou,et al.  Multilabel Neural Networks with Applications to Functional Genomics and Text Categorization , 2006, IEEE Transactions on Knowledge and Data Engineering.

[21]  Houfeng Wang,et al.  Unpaired Sentiment-to-Sentiment Translation: A Cycled Reinforcement Learning Approach , 2018, ACL.

[22]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[23]  Kristian Kersting,et al.  How is a data-driven approach better than random choice in label space division for multi-label classification? , 2016, Entropy.

[24]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[25]  Anna Korhonen,et al.  Initializing neural networks for hierarchical multi-label text classification , 2017, BioNLP.

[26]  Xu Sun,et al.  Label Embedding Network: Learning Label Representation for Soft Training of Deep Networks , 2017, ArXiv.

[27]  Jason Weston,et al.  A kernel method for multi-labelled classification , 2001, NIPS.

[28]  Yoram Singer,et al.  Improved Boosting Algorithms Using Confidence-rated Predictions , 1998, COLT' 98.

[29]  Xu Sun,et al.  Bag-of-Words as Target for Neural Machine Translation , 2018, ACL.

[30]  Regina Barzilay,et al.  Style Transfer from Non-Parallel Text by Cross-Alignment , 2017, NIPS.

[31]  Li Li,et al.  Multi-label Text Categorization with Joint Learning Predictions-as-Features Method , 2015, EMNLP.

[32]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[33]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[34]  Yiming Yang,et al.  RCV1: A New Benchmark Collection for Text Categorization Research , 2004, J. Mach. Learn. Res..

[35]  Xu Sun,et al.  Global Encoding for Abstractive Summarization , 2018, ACL.

[36]  Johannes Fürnkranz,et al.  Large-Scale Multi-label Text Classification - Revisiting Neural Networks , 2013, ECML/PKDD.

[37]  Yiming Yang,et al.  Multilabel classification with meta-level features , 2010, SIGIR.

[38]  Fernando Benites,et al.  HARAM: A Hierarchical ARAM Neural Network for Large-Scale Text Classification , 2015, 2015 IEEE International Conference on Data Mining Workshop (ICDMW).

[39]  Zhi-Hua Zhou,et al.  ML-KNN: A lazy learning approach to multi-label learning , 2007, Pattern Recognit..

[40]  Jason Weston,et al.  A Neural Attention Model for Abstractive Sentence Summarization , 2015, EMNLP.