Active Learning for Classification with Maximum Model Change

Most existing active learning studies focus on designing sample selection algorithms. However, several fundamental problems deserve investigation to provide deep insight into active learning. In this article, we conduct an in-depth investigation on active learning for classification from the perspective of model change. We derive a general active learning framework for classification called maximum model change (MMC), which aims at querying the influential examples. The model change is quantified as the difference between the model parameters before and after training with the expanded training set. Inspired by the stochastic gradient update rule, the gradient of the loss with respect to a given candidate example is adopted to approximate the model change. This framework is applied to two popular classifiers: support vector machines and logistic regression. We analyze the convergence property of MMC and theoretically justify it. We explore the connection between MMC and uncertainty-based sampling to provide a uniform view. In addition, we discuss its potential usability to other learning models and show its applicability in a wide range of applications. We validate the MMC strategy on two kinds of benchmark datasets, the UCI repository and ImageNet, and show that it outperforms many state-of-the-art methods.

[1]  John Tait,et al.  CLAIRE: A modular support vector image indexing and classification system , 2006, TOIS.

[2]  F. Jurie,et al.  Learning Non-linear SVM in Input Space for Image Classification , 2014 .

[3]  David A. Cohn,et al.  Active Learning with Statistical Models , 1996, NIPS.

[4]  Xindong Wu,et al.  Active Learning with Adaptive Heterogeneous Ensembles , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[5]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[6]  Klaus Brinker,et al.  Incorporating Diversity in Active Learning with Support Vector Machines , 2003, ICML.

[7]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[8]  Rong Jin,et al.  Semisupervised SVM batch mode active learning with applications to image retrieval , 2009, TOIS.

[9]  Chris H. Q. Ding,et al.  Active Learning for Support Vector Machines with Maximum Model Change , 2014, ECML/PKDD.

[10]  Joachim Denzler,et al.  Selecting Influential Examples: Active Learning with Expected Model Output Changes , 2014, ECCV.

[11]  Mark Craven,et al.  An Analysis of Active Learning Strategies for Sequence Labeling Tasks , 2008, EMNLP.

[12]  Johan A. K. Suykens,et al.  Support Vector Machine Classifier With Pinball Loss , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Yuhong Guo,et al.  Active Instance Sampling via Matrix Partition , 2010, NIPS.

[14]  Robert C. Holte,et al.  Decision Tree Instability and Active Learning , 2007, ECML.

[15]  Yoram Singer,et al.  Pegasos: primal estimated sub-gradient solver for SVM , 2007, ICML '07.

[16]  Raymond J. Mooney,et al.  Diverse ensembles for active learning , 2004, ICML.

[17]  Jimmy J. Lin,et al.  Large-scale machine learning at twitter , 2012, SIGMOD Conference.

[18]  Maria-Florina Balcan,et al.  Agnostic active learning , 2006, J. Comput. Syst. Sci..

[19]  Manuel Graña,et al.  Random forest active learning for AAA thrombus segmentation in computed tomography angiography images , 2014, Neurocomputing.

[20]  Jaime G. Carbonell,et al.  Optimizing estimated loss reduction for active sampling in rank learning , 2008, ICML '08.

[21]  Dale Schuurmans,et al.  Discriminative Batch Mode Active Learning , 2007, NIPS.

[22]  Koby Crammer,et al.  Breaking the curse of kernelization: budgeted stochastic gradient descent for large-scale SVM training , 2012, J. Mach. Learn. Res..

[23]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[24]  Edward Y. Chang,et al.  Support vector machine active learning for image retrieval , 2001, MULTIMEDIA '01.

[25]  William A. Gale,et al.  A sequential algorithm for training text classifiers , 1994, SIGIR '94.

[26]  Rong Jin,et al.  Batch Mode Active Learning with Applications to Text Categorization and Image Retrieval , 2009, IEEE Transactions on Knowledge and Data Engineering.

[27]  H. Sebastian Seung,et al.  Selective Sampling Using the Query by Committee Algorithm , 1997, Machine Learning.

[28]  Xiaowei Xu,et al.  Representative Sampling for Text Classification Using Support Vector Machines , 2003, ECIR.

[29]  Burr Settles,et al.  Active Learning Literature Survey , 2009 .

[30]  Ishwar K. Sethi,et al.  Confidence-based active learning , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[31]  Tao Xiang,et al.  Finding Rare Classes: Active Learning with Generative and Discriminative Models , 2013, IEEE Transactions on Knowledge and Data Engineering.

[32]  Sanjoy Dasgupta,et al.  Hierarchical sampling for active learning , 2008, ICML '08.

[33]  Naoki Abe,et al.  Query Learning Strategies Using Boosting and Bagging , 1998, ICML.

[34]  Wenbin Cai,et al.  Active learning for ranking with sample density , 2015, Information Retrieval Journal.

[35]  Bin Li,et al.  A survey on instance selection for active learning , 2012, Knowledge and Information Systems.

[36]  Jingbo Zhu,et al.  A Density-Based Re-ranking Technique for Active Learning for Data Annotations , 2009, ICCPOL.

[37]  Andrew McCallum,et al.  Toward Optimal Active Learning through Sampling Estimation of Error Reduction , 2001, ICML.

[38]  Greg Schohn,et al.  Less is More: Active Learning with Support Vector Machines , 2000, ICML.

[39]  Tobias Scheffer,et al.  International Conference on Machine Learning (ICML-99) , 1999, Künstliche Intell..

[40]  Yiming Yang,et al.  Modified Logistic Regression: An Approximation to SVM and Its Applications in Large-Scale Text Categorization , 2003, ICML.

[41]  Ashish Kapoor,et al.  Active learning for sparse bayesian multilabel classification , 2014, KDD.

[42]  Daphne Koller,et al.  Support Vector Machine Active Learning with Applications to Text Classification , 2002, J. Mach. Learn. Res..

[43]  Mark Craven,et al.  Multiple-Instance Active Learning , 2007, NIPS.

[44]  Sanjoy Dasgupta,et al.  A General Agnostic Active Learning Algorithm , 2007, ISAIM.

[45]  Shinichi Nakajima,et al.  Pool-based active learning in approximate linear regression , 2009, Machine Learning.

[46]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[47]  Tie-Yan Liu,et al.  Adapting ranking SVM to document retrieval , 2006, SIGIR.

[48]  Steve Hanneke,et al.  A bound on the label complexity of agnostic active learning , 2007, ICML '07.

[49]  Alexander J. Smola,et al.  Online learning with kernels , 2001, IEEE Transactions on Signal Processing.

[50]  Rong Jin,et al.  Active Learning by Querying Informative and Representative Examples , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[51]  Arnold W. M. Smeulders,et al.  Active learning using pre-clustering , 2004, ICML.

[52]  Jian-Tao Sun,et al.  Multi-domain active learning for text classification , 2012, KDD.

[53]  Christian Igel,et al.  Active learning with support vector machines , 2014, WIREs Data Mining Knowl. Discov..

[54]  Very Large Corpora Empirical Methods in Natural Language Processing , 1999 .

[55]  Yoram Singer,et al.  Pegasos: primal estimated sub-gradient solver for SVM , 2011, Math. Program..

[56]  Jun Zhou,et al.  Maximizing Expected Model Change for Active Learning in Regression , 2013, 2013 IEEE 13th International Conference on Data Mining.

[57]  Lyle H. Ungar,et al.  Machine Learning manuscript No. (will be inserted by the editor) Active Learning for Logistic Regression: , 2007 .