Evolino for recurrent support vector machines

Traditional Support Vector Machines (SVMs) need pre-wired finite time windows to predict and classify time series. They do not have an internal state necessar y to deal with sequences involving arbitrary long-term dependencies. Here we introduce a new class of recurrent, truly sequential SVM-like devices with internal adaptive states, trained by a novel method called EVOlution of systems with KErnel-based outputs (Evoke), an instance of the recent Evolino class of methods [1, 2]. Evoke evolves recurrent neural networks to detect and represent temporal dependencies while using quadratic programming/support vector regression to produce precise outputs, in contrast t o our recent work [1, 2] which instead uses pseudoinverse regression. Evoke is the first SVM-based mechanis m learning to classify a context-sensitive language. It also outperforms recent state-of-the-art gra dient-based recurrent neural networks (RNNs) on various time series prediction tasks.

[1]  R. Penrose A Generalized inverse for matrices , 1955 .

[2]  Ingo Rechenberg,et al.  Evolutionsstrategie : Optimierung technischer Systeme nach Prinzipien der biologischen Evolution , 1973 .

[3]  P. Werbos,et al.  Beyond Regression : "New Tools for Prediction and Analysis in the Behavioral Sciences , 1974 .

[4]  W. Vent,et al.  Rechenberg, Ingo, Evolutionsstrategie — Optimierung technischer Systeme nach Prinzipien der biologischen Evolution. 170 S. mit 36 Abb. Frommann‐Holzboog‐Verlag. Stuttgart 1973. Broschiert , 1975 .

[5]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[6]  Editors , 1986, Brain Research Bulletin.

[7]  Xin Yao,et al.  A review of evolutionary artificial neural networks , 1993, Int. J. Intell. Syst..

[8]  Yoshua Bengio,et al.  Diffusion of Credit in Markovian Models , 1994, NIPS.

[9]  Barak A. Pearlmutter Gradient calculations for dynamic recurrent neural networks: a survey , 1995, IEEE Trans. Neural Networks.

[10]  Vladimir Vapnik,et al.  The Nature of Statistical Learning , 1995 .

[11]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[12]  F. Girosi,et al.  Nonlinear prediction of chaotic time series using support vector machines , 1997, Neural Networks for Signal Processing VII. Proceedings of the 1997 IEEE Signal Processing Society Workshop.

[13]  Gunnar Rätsch,et al.  Predicting Time Series with Support Vector Machines , 1997, ICANN.

[14]  David Haussler,et al.  Exploiting Generative Models in Discriminative Classifiers , 1998, NIPS.

[15]  Jürgen Schmidhuber,et al.  Learning to Forget: Continual Prediction with LSTM , 2000, Neural Computation.

[16]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[17]  J. Suykens,et al.  Recurrent least squares support vector machines , 2000 .

[18]  Nello Cristianini,et al.  Classification using String Kernels , 2000 .

[19]  Shigeki Sagayama,et al.  Dynamic Time-Alignment Kernel in Support Vector Machine , 2001, NIPS.

[20]  Michael C. Mozer,et al.  A Discrete Probabilistic Memory Model for Discovering Dependencies in Time , 2001, ICANN.

[21]  Jürgen Schmidhuber,et al.  LSTM recurrent networks learn simple context-free and context-sensitive languages , 2001, IEEE Trans. Neural Networks.

[22]  Jürgen Schmidhuber,et al.  Learning Precise Timing with LSTM Recurrent Networks , 2003, J. Mach. Learn. Res..

[23]  Samy Bengio,et al.  Torch: a modular machine learning software library , 2002 .

[24]  Jürgen Schmidhuber,et al.  Learning Nonregular Languages: A Comparison of Simple Recurrent Networks and LSTM , 2002, Neural Computation.

[25]  Simon King,et al.  Framewise phone classification using support vector machines , 2002, INTERSPEECH.

[26]  Risto Miikkulainen,et al.  Active Guidance for a Finless Rocket Using Neuroevolution , 2003, GECCO.

[27]  Risto Miikkulainen,et al.  Robust non-linear control through neuroevolution , 2003 .

[28]  Jürgen Schmidhuber,et al.  Kalman filters improve LSTM network performance in problems unsolvable by traditional recurrent nets , 2003, Neural Networks.

[29]  Harald Haas,et al.  Harnessing Nonlinearity: Predicting Chaotic Systems and Saving Energy in Wireless Communication , 2004, Science.

[30]  Tony Jebara,et al.  Probability Product Kernels , 2004, J. Mach. Learn. Res..

[31]  Jürgen Schmidhuber,et al.  Framewise phoneme classification with bidirectional LSTM and other neural network architectures , 2005, Neural Networks.

[32]  A. Ijspeert,et al.  Associative Memory Models Based on Coupled Oscillators Semester Project , 2005 .

[33]  Jürgen Schmidhuber,et al.  Evolino: Hybrid Neuroevolution / Optimal Linear Search for Sequence Prediction , 2005, IJCAI 2005.

[34]  Philipp Slusallek,et al.  Introduction to real-time ray tracing , 2005, SIGGRAPH Courses.

[35]  Jürgen Schmidhuber,et al.  Co-evolving recurrent neurons learn deep memory POMDPs , 2005, GECCO '05.

[36]  A. Ijspeert,et al.  Dynamic hebbian learning in adaptive frequency oscillators , 2006 .