Sequence Labelling in Structured Domains with Hierarchical Recurrent Neural Networks

Modelling data in structured domains requires establishing the relations among patterns at multiple scales. When these patterns arise from sequential data, the multiscale structure also contains a dynamic component that must be modelled, particularly, as is often the case, if the data is unsegmented. Probabilistic graphical models are the predominant framework for labelling unsegmented sequential data in structured domains. Their use requires a certain degree of a priori knowledge about the relations among patterns and about the patterns themselves. This paper presents a hierarchical system, based on the connectionist temporal classification algorithm, for labelling unsegmented sequential data at multiple scales with recurrent neural networks only. Experiments on the recognition of sequences of spoken digits show that the system outperforms hidden Markov models, while making fewer assumptions about the domain.

[1]  John Scott Bridle,et al.  Probabilistic Interpretation of Feedforward Classification Network Outputs, with Relationships to Statistical Pattern Recognition , 1989, NATO Neurocomputing.

[2]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[3]  Paul J. Werbos,et al.  Backpropagation Through Time: What It Does and How to Do It , 1990, Proc. IEEE.

[4]  Ian H. Witten,et al.  Identifying Hierarchical Structure in Sequences: A linear-time algorithm , 1997, J. Artif. Intell. Res..

[5]  Kuldip K. Paliwal,et al.  Bidirectional recurrent neural networks , 1997, IEEE Trans. Signal Process..

[6]  Treebank Penn,et al.  Linguistic Data Consortium , 1999 .

[7]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[8]  Neil A. Dodgson,et al.  Proceedings Ninth IEEE International Conference on Computer Vision , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[9]  Jürgen Schmidhuber,et al.  Framewise phoneme classification with bidirectional LSTM and other neural network architectures , 2005, Neural Networks.

[10]  Jürgen Schmidhuber,et al.  Bidirectional LSTM Networks for Improved Phoneme Classification and Recognition , 2005, ICANN.

[11]  Martial Hebert,et al.  A hierarchical field framework for unified context-based classification , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[12]  Steve Young,et al.  The HTK book version 3.4 , 2006 .

[13]  Jürgen Schmidhuber,et al.  Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks , 2006, ICML.