Discriminative kernel-based phoneme sequence recognition

We describe a new method for phoneme sequence recognition given a speech utterance, which is not based on the HMM. In contrast to HMM-based approaches, our method uses a discriminative kernel-based training procedure in which the learning process is tailored to the goal of minimizing the Levenshtein distance between the predicted phoneme sequence and the correct sequence. The phoneme sequence predictor is devised by mapping the speech utterance along with a proposed phoneme sequence to a vector-space endowed with an inner-product that is realized by a Mercer kernel. Building on large margin techniques for predicting whole sequences, we are able to devise a learning algorithm which distills to separating the correct phoneme sequence from all other sequences. We describe an iterative algorithm for learning the phoneme sequence recognizer and further describe an efficient implementation of it. We present initial encouraging experimental results with the TIMIT and compare the proposed method to an HMM-based approach.

[1]  Yoram Singer,et al.  An Online Algorithm for Hierarchical Phoneme Classification , 2004, MLMI.

[2]  Yoram Singer,et al.  Learning to Align Polyphonic Music , 2004, ISMIR.

[3]  Claudio Gentile,et al.  On the generalization ability of on-line learning algorithms , 2001, IEEE Transactions on Information Theory.

[4]  Koby Crammer,et al.  Online Passive-Aggressive Algorithms , 2003, J. Mach. Learn. Res..

[5]  Hsiao-Wuen Hon,et al.  Speaker-independent phone recognition using hidden Markov models , 1989, IEEE Trans. Acoust. Speech Signal Process..

[6]  Simon King,et al.  Framewise phone classification using support vector machines , 2002, INTERSPEECH.

[7]  Steve Young,et al.  A review of large-vocabulary continuous-speech recognition , 1996 .

[8]  Thomas Hofmann,et al.  Support vector machine learning for interdependent and structured output spaces , 2004, ICML.

[9]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[10]  Samy Bengio,et al.  Torch: a modular machine learning software library , 2002 .

[11]  Ben-Zion Bobrovsky,et al.  Plosive spotting with margin classifiers , 2001, INTERSPEECH.

[12]  Michael Collins,et al.  Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms , 2002, EMNLP.

[13]  Ben Taskar,et al.  Max-Margin Markov Networks , 2003, NIPS.

[14]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[15]  Mari Ostendorf,et al.  Fast algorithms for phone classification and recognition using segment-based models , 1992, IEEE Trans. Signal Process..

[16]  Yoram Singer,et al.  Phoneme alignment based on discriminative learning , 2005, INTERSPEECH.

[17]  Mari Ostendorf,et al.  From HMM's to segment models: a unified view of stochastic modeling for speech recognition , 1996, IEEE Trans. Speech Audio Process..

[18]  Nello Cristianini,et al.  An introduction to Support Vector Machines , 2000 .

[19]  Li Deng,et al.  Speech trajectory discrimination using the minimum classification error learning , 1998, IEEE Trans. Speech Audio Process..