Online Passive-Aggressive Algorithms

We present a unified view for online classification, regression, and uni-class problems. This view leads to a single algorithmic framework for the three problems. We prove worst case loss bounds for various algorithms for both the realizable case and the non-realizable case. A conversion of our main online algorithm to the setting of batch learning is also discussed. The end result is new algorithms and accompanying loss bounds for the hinge-loss.

[1]  I. J. Schoenberg,et al.  The Relaxation Method for Linear Inequalities , 1954, Canadian Journal of Mathematics.

[2]  F ROSENBLATT,et al.  The perceptron: a probabilistic model for information storage and organization in the brain. , 1958, Psychological review.

[3]  Albert B Novikoff,et al.  ON CONVERGENCE PROOFS FOR PERCEPTRONS , 1963 .

[4]  J. J. Rocchio,et al.  Relevance feedback in information retrieval , 1971 .

[5]  Gerard Salton,et al.  The SMART Retrieval System—Experiments in Automatic Document Processing , 1971 .

[6]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[7]  N. Littlestone Mistake bounds and logarithmic linear-threshold learning algorithms , 1990 .

[8]  Hans Ulrich Simon,et al.  From noise-free to noise-tolerant and from on-line to batch learning , 1995, COLT '95.

[9]  Heinz H. Bauschke,et al.  On Projection Algorithms for Solving Convex Feasibility Problems , 1996, SIAM Rev..

[10]  Y. Censor,et al.  Parallel Optimization: Theory, Algorithms, and Applications , 1997 .

[11]  Manfred K. Warmuth,et al.  Exponentiated Gradient Versus Gradient Descent for Linear Predictors , 1997, Inf. Comput..

[12]  Y. Censor,et al.  Parallel Optimization:theory , 1997 .

[13]  Claudio Gentile,et al.  Linear Hinge Loss and Average Margin , 1998, NIPS.

[14]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[15]  Yoav Freund,et al.  Large Margin Classification Using the Perceptron Algorithm , 1998, COLT.

[16]  Yoram Singer,et al.  Improved Boosting Algorithms Using Confidence-rated Predictions , 1998, COLT' 98.

[17]  Jason Weston,et al.  Support vector machines for multi-class pattern recognition , 1999, ESANN.

[18]  Manfred K. Warmuth,et al.  Relative loss bounds for single neurons , 1999, IEEE Trans. Neural Networks.

[19]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[20]  Yoram Singer,et al.  Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers , 2000, J. Mach. Learn. Res..

[21]  Claudio Gentile,et al.  A New Approximate Maximal Margin Classification Algorithm , 2002, J. Mach. Learn. Res..

[22]  Jason Weston,et al.  A kernel method for multi-labelled classification , 2001, NIPS.

[23]  Mark Herbster,et al.  Learning Additive Models Online with Fast Evaluating Kernels , 2001, COLT/EuroCOLT.

[24]  Koby Crammer,et al.  On the Algorithmic Implementation of Multiclass Kernel-based Vector Machines , 2002, J. Mach. Learn. Res..

[25]  Koby Crammer,et al.  A new family of online algorithms for category ranking , 2002, SIGIR '02.

[26]  Michael I. Jordan,et al.  Distance Metric Learning with Application to Clustering with Side-Information , 2002, NIPS.

[27]  Dustin Boswell,et al.  Introduction to Support Vector Machines , 2002 .

[28]  Ben Taskar,et al.  Max-Margin Markov Networks , 2003, NIPS.

[29]  Koby Crammer,et al.  Ultraconservative Online Algorithms for Multiclass Problems , 2001, J. Mach. Learn. Res..

[30]  Thomas Hofmann,et al.  Hidden Markov Support Vector Machines , 2003, ICML.

[31]  Koby Crammer,et al.  A Family of Additive Online Algorithms for Category Ranking , 2003, J. Mach. Learn. Res..

[32]  Anthony Widjaja,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2003, IEEE Transactions on Neural Networks.

[33]  Martin Zinkevich,et al.  Online Convex Programming and Generalized Infinitesimal Gradient Ascent , 2003, ICML.

[34]  Yoram Singer,et al.  Large margin hierarchical classification , 2004, ICML.

[35]  Claudio Gentile,et al.  The Robustness of the p-Norm Algorithms , 2003, Machine Learning.

[36]  Alexander J. Smola,et al.  Online learning with kernels , 2001, IEEE Transactions on Signal Processing.

[37]  Manfred K. Warmuth,et al.  Relative Loss Bounds for Multidimensional Regression Problems , 1997, Machine Learning.

[38]  Yi Li,et al.  The Relaxed Online Maximum Margin Algorithm , 1999, Machine Learning.

[39]  Thomas Hofmann,et al.  Support vector machine learning for interdependent and structured output spaces , 2004, ICML.

[40]  Yoram Singer,et al.  Online and batch learning of pseudo-metrics , 2004, ICML.

[41]  Yoram Singer,et al.  Learning to Align Polyphonic Music , 2004, ISMIR.

[42]  Yoram Singer,et al.  A Comparison of New and Old Algorithms for a Mixture Estimation Problem , 1995, COLT '95.

[43]  Yoram Singer,et al.  BoosTexter: A Boosting-based System for Text Categorization , 2000, Machine Learning.

[44]  Yoram Singer,et al.  The Power of Selective Memory: Self-Bounded Learning of Prediction Suffix Trees , 2004, NIPS.

[45]  Michael Collins,et al.  Discriminative Reranking for Natural Language Parsing , 2000, CL.