论文信息 - Path Kernels and Multiplicative Updates

Path Kernels and Multiplicative Updates

We consider a natural convolution kernel defined by a directed graph. Each edge contributes an input. The inputs along a path form a product and the products for all paths are summed. We also have a set of probabilities on the edges so that the outflow from each node is one. We then discuss multiplicative updates on these graphs where the prediction is essentially a kernel computation and the update contributes a factor to each edge. Now the total outflow out of each node is not one any more. However some clever algorithms re-normalize the weights on the paths so that the total outflow out of each node is one again. Finally we discuss the use of regular expressions for speeding up the kernel and re-normalization computation. In particular we rewrite the multiplicative algorithms that predict as well as the best pruning of a series parallel graph in terms of efficient kernel computations.

Manfred K. Warmuth | Eiji Takimoto | E. Takimoto | Eiji Takimoto

[1] G. G. Stokes. "J." , 1890, The New Yale Book of Quotations.

[2] Eugene L. Lawler,et al. The recognition of Series Parallel digraphs , 1979, SIAM J. Comput..

[3] N. Littlestone. Learning Quickly When Irrelevant Attributes Abound: A New Linear-Threshold Algorithm , 1987, 28th Annual Symposium on Foundations of Computer Science (sfcs 1987).

[4] Alfredo De Santis,et al. Learning probabilistic prediction functions , 1988, [Proceedings 1988] 29th Annual Symposium on Foundations of Computer Science.

[5] Vladimir Vovk,et al. Aggregating strategies , 1990, COLT '90.

[6] Yossi Azar,et al. Competitive routing of virtual circuits with unknown duration , 1994, SODA '94.

[7] Manfred K. Warmuth,et al. The Weighted Majority Algorithm , 1994, Inf. Comput..

[8] Robert E. Schapire,et al. Predicting Nearly As Well As the Best Pruning of a Decision Tree , 1995, COLT '95.

[9] Manfred K. Warmuth,et al. Additive versus exponentiated gradient updates for linear prediction , 1995, STOC '95.

[10] Yoav Freund,et al. A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[11] Manfred K. Warmuth,et al. The perceptron algorithm vs. Winnow: linear vs. logarithmic mistake bounds when few input variables are relevant , 1995, COLT '95.

[12] Stefano Leonardi,et al. On-line Network Routing , 1996, Online Algorithms.

[13] Yoram Singer,et al. Training Algorithms for Hidden Markov Models using Entropy Based Distance Functions , 1996, NIPS.

[14] Sandy Irani,et al. Cost-Aware WWW Proxy Caching Algorithms , 1997, USENIX Symposium on Internet Technologies and Systems.

[15] Manfred K. Warmuth,et al. The Perceptron Algorithm Versus Winnow: Linear Versus Logarithmic Mistake Bounds when Few Input Variables are Relevant (Technical Note) , 1997, Artif. Intell..

[16] Manfred K. Warmuth,et al. Exponentiated Gradient Versus Gradient Descent for Linear Predictors , 1997, Inf. Comput..