论文信息 - Learning the hidden structure of speech.

Learning the hidden structure of speech.

In the work described here, the backpropagation neural network learning procedure is applied to the analysis and recognition of speech. This procedure takes a set of input/output pattern pairs and attempts to learn their functional relationship; it develops the necessary representational features during the course of learning. A series of computer simulation studies was carried out to assess the ability of these networks to accurately label sounds, to learn to recognize sounds without labels, and to learn feature representations of continuous speech. These studies demonstrated that the networks can learn to label presegmented test tokens with accuracies of up to 95%. Networks trained on segmented sounds using a strategy that requires no external labels were able to recognize and delineate sounds in continuous speech. These networks developed rich internal representations that included units which corresponded to such traditional distinctions as vowels and consonants, as well as units that were sensitive to novel and nonstandard features. Networks trained on a large corpus of unsegmented, continuous speech without labels also developed interesting feature representations, which may be useful in both segmentation and label learning. The results of these studies, while preliminary, demonstrate that backpropagation learning can be used with complex, natural data to identify a feature structure that can serve as the basis for both analysis and nontrivial pattern recognition.

[1] Wayne A. Wickelgran. Context-sensitive coding, associative memory, and serial order in (speech) behavior. , 1969 .

[2] Mark Aronoff,et al. Word Formation in Generative Grammar , 1979 .

[3] Harvey F. Silverman,et al. The 1976 modular acoustic processor(MAP) , 1977 .

[4] Jacques Mehler,et al. The Role of Syllables in Speech Processing: Infant and Adult Data [and Discussion] , 1981 .

[5] 藤村靖,et al. Syllables as concatenative phonetic units , 1982 .

[6] Erkki Reuhkala,et al. On-line recognition of spoken words from a large vocabulary , 1984, Inf. Sci..

[7] Geoffrey E. Hinton,et al. A Learning Algorithm for Boltzmann Machines , 1985, Cogn. Sci..

[8] J. Perkell,et al. Invariance and variability in speech processes , 1987 .