Path Kernels and Multiplicative Updates

We consider a natural convolution kernel defined by a directed graph. Each edge contributes an input. The inputs along a path form a product and the products for all paths are summed. We also have a set of probabilities on the edges so that the outflow from each node is one. We then discuss multiplicative updates on these graphs where the prediction is essentially a kernel computation and the update contributes a factor to each edge. Now the total outflow out of each node is not one any more. However some clever algorithms re-normalize the weights on the paths so that the total outflow out of each node is one again. Finally we discuss the use of regular expressions for speeding up the kernel and re-normalization computation. In particular we rewrite the multiplicative algorithms that predict as well as the best pruning of a series parallel graph in terms of efficient kernel computations.

[1]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[2]  Eugene L. Lawler,et al.  The recognition of Series Parallel digraphs , 1979, SIAM J. Comput..

[3]  N. Littlestone Learning Quickly When Irrelevant Attributes Abound: A New Linear-Threshold Algorithm , 1987, 28th Annual Symposium on Foundations of Computer Science (sfcs 1987).

[4]  Alfredo De Santis,et al.  Learning probabilistic prediction functions , 1988, [Proceedings 1988] 29th Annual Symposium on Foundations of Computer Science.

[5]  Vladimir Vovk,et al.  Aggregating strategies , 1990, COLT '90.

[6]  Yossi Azar,et al.  Competitive routing of virtual circuits with unknown duration , 1994, SODA '94.

[7]  Manfred K. Warmuth,et al.  The Weighted Majority Algorithm , 1994, Inf. Comput..

[8]  Robert E. Schapire,et al.  Predicting Nearly As Well As the Best Pruning of a Decision Tree , 1995, COLT '95.

[9]  Manfred K. Warmuth,et al.  Additive versus exponentiated gradient updates for linear prediction , 1995, STOC '95.

[10]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[11]  Manfred K. Warmuth,et al.  The perceptron algorithm vs. Winnow: linear vs. logarithmic mistake bounds when few input variables are relevant , 1995, COLT '95.

[12]  Stefano Leonardi,et al.  On-line Network Routing , 1996, Online Algorithms.

[13]  Yoram Singer,et al.  Training Algorithms for Hidden Markov Models using Entropy Based Distance Functions , 1996, NIPS.

[14]  Sandy Irani,et al.  Cost-Aware WWW Proxy Caching Algorithms , 1997, USENIX Symposium on Internet Technologies and Systems.

[15]  Manfred K. Warmuth,et al.  The Perceptron Algorithm Versus Winnow: Linear Versus Logarithmic Mistake Bounds when Few Input Variables are Relevant (Technical Note) , 1997, Artif. Intell..

[16]  Manfred K. Warmuth,et al.  Exponentiated Gradient Versus Gradient Descent for Linear Predictors , 1997, Inf. Comput..

[17]  Tom Bylander,et al.  The binary exponentiated gradient algorithm for learning linear functions , 1997, COLT '97.

[18]  Claudio Gentile,et al.  Linear Hinge Loss and Average Margin , 1998, NIPS.

[19]  Manfred K. Warmuth,et al.  Efficient Learning With Virtual Threshold Gates , 1995, Inf. Comput..

[20]  Claudio Gentile,et al.  The Robustness of the p-Norm Algorithms , 1999, COLT '99.

[21]  Manfred K. Warmuth,et al.  Predicting nearly as well as the best pruning of a planar decision graph , 2002, Theor. Comput. Sci..

[22]  David Haussler,et al.  Convolution kernels on discrete structures , 1999 .

[23]  C. Watkins Dynamic Alignment Kernels , 1999 .

[24]  Manfred K. Warmuth,et al.  Averaging Expert Predictions , 1999, EuroCOLT.

[25]  Nello Cristianini,et al.  A multiplicative updating algorithm for training support vector machine , 1999, ESANN.

[26]  Manfred K. Warmuth,et al.  Direct and Indirect Algorithms for On-line Learning of Disjunctions , 1999, EuroCOLT.

[27]  Nello Cristianini,et al.  An introduction to Support Vector Machines , 2000 .

[28]  Bernhard Schölkopf,et al.  Dynamic Alignment Kernels , 2000 .

[29]  Azer Bestavros,et al.  GreedyDual* Web caching algorithm: exploiting the two sources of temporal locality in Web request streams , 2001, Comput. Commun..

[30]  Vladimir Vovk,et al.  Predicting nearly as well as the best pruning of a decision tree through dynamic programming scheme , 2001, Theor. Comput. Sci..

[31]  Rocco A. Servedio,et al.  Efficiency versus Convergence of Boolean Kernels for On-Line Learning Algorithms , 2001, NIPS.

[32]  Mehryar Mohri,et al.  Rational Kernels , 2002, NIPS.

[33]  Scott A. Brandt,et al.  ACME: Adaptive Caching Using Multiple Experts , 2002, WDAS.

[34]  Rafail Ostrovsky,et al.  Dynamic routing on networks with fixed-size buffers , 2003, SODA '03.

[35]  Daniel D. Lee,et al.  Multiplicative Updates for Large Margin Classifiers , 2003, COLT.

[36]  Mark Herbster,et al.  Tracking the Best Expert , 1995, Machine Learning.