Weighted Rational Transductions and their Application to Human Language Processing

We present the concepts of weighted language, transduction and automaton from algebraic automata theory as a general framework for describing and implementing decoding cascades in speech and language processing. This generality allows us to represent uniformly such information sources as pronunciation dictionaries, language models and lattices, and to use uniform algorithms for building decoding stages and for optimizing and combining them. In particular, a single automata join algorithm can be used either to combine information sources such as a pronunciation dictionary and a context-dependency model during the construction of a decoder, or dynamically during the operation of the decoder. Applications to speech recognition and to Chinese text segmentation will be discussed.

[1]  Azaria Paz,et al.  Introduction to Probabilistic Automata , 1971 .

[2]  Azaria Paz,et al.  Introduction to probabilistic automata (Computer science and applied mathematics) , 1971 .

[3]  Ray Teitelbaum,et al.  Context-free error analysis by evaluation of algebraic power series , 1973, STOC.

[4]  Taylor L. Booth,et al.  Applying Probability Measures to Abstract Languages , 1973, IEEE Transactions on Computers.

[5]  Jean Berstel,et al.  Transductions and context-free languages , 1979, Teubner Studienbücher : Informatik.

[6]  Kimmo Koskenniemi,et al.  A General Computational Model for Word-Form Recognition and Production , 1984, ACL.

[7]  Lalit R. Bahl,et al.  A Maximum Likelihood Approach to Continuous Speech Recognition , 1983, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Arto Salomaa,et al.  Semirings, Automata and Languages , 1985 .

[9]  Arto Salomaa,et al.  Semirings, Automata, Languages , 1985, EATCS Monographs on Theoretical Computer Science.

[10]  Jean Berstel,et al.  Rational series and their languages , 1988, EATCS monographs on theoretical computer science.

[11]  Michael A. Arbib,et al.  An Introduction to Formal Language Theory , 1988, Texts and Monographs in Computer Science.

[12]  Bernard Lang A Generative View of Ill-Formed Input Processing , 1989 .

[13]  Mark Liberman,et al.  A Finite-State Morphological Processor For Spanish , 1990, COLING.

[14]  Kenneth Ward Church,et al.  A comparison of the enhanced Good-Turing and deleted estimation methods for estimating probabilities of English bigrams , 1991 .

[15]  Andrej Ljolje,et al.  Optimal speech recognition using phone recognition and lexical access , 1992, ICSLP.

[16]  Emmanuel Roche Analyse syntaxique transformationnelle du francais par transducteurs et lexique-grammaire , 1993 .

[17]  Martin Kay,et al.  Regular Models of Phonological Rule Systems , 1994, CL.