Parallel Networks that Learn to Pronounce

This paper describes NETtalk, a class of massively-parallel network systems that learn to convert English text to speech. The memory representations for pronunciations are learned by practice and are shared among many processing units. The performance of NETtalk has some similarities with observed human performance. (i) The learning follows a power law. (ii) The more words the network learns, the better it is at generalizing and correctly pronouncing new words, (iii) The performance of the network degrades very slowly as connections in the network are damaged: no single link or processing unit is essential. (iv) Relearning after damage is much faster than learning during the original training. (v) Distributed or spaced prac- tice is more effective for long-term retention than massed practice. Network models can be constructed that have the same perfor- mance and learning characteristics on a particular task, but differ completely at the levels of synaptic strengths and single-unit responses. However, hierarchical clustering techniques applied to NETtalk re- veal that these different networks have similar internal representations of letter-to-sound correspondences within groups of processing units. This suggests that invariant internal representations may be found in assemblies of neurons intermediate in size between highly localized and completely distributed representations.

[1]  S. Kaplan The Physiology of Thought , 1950 .

[2]  D. Hubel,et al.  Receptive fields, binocular interaction and functional architecture in the cat's visual cortex , 1962, The Journal of physiology.

[3]  A. A. Mullin,et al.  Principles of neurodynamics , 1962 .

[4]  L. R. Peterson,et al.  Effect of spacing presentations on retention of a paired associate over short intervals. , 1963, Journal of experimental psychology.

[5]  R. Venezky The Structure of English Orthography , 1965 .

[6]  L. Squire Mechanisms of memory. , 1986, Lancet.

[7]  R. Rescorla,et al.  A theory of Pavlovian conditioning : Variations in the effectiveness of reinforcement and nonreinforcement , 1972 .

[8]  R. Venezky,et al.  Development of Two Letter-Sound Patterns in Grades One Through Three. , 1973 .

[9]  Douglas L. Hintzman,et al.  Theoretical implications of the spacing effect. , 1974 .

[10]  R. Sperber Developmental changes in effects of spacing of trials in retardate discrimination learning and memory. , 1974 .

[11]  Brian Everitt,et al.  Cluster analysis , 1974 .

[12]  A. Glenberg Monotonic and nonmonotonic lag effects in paired-associate and recognition memory paradigms , 1976 .

[13]  D Marr,et al.  Cooperative computation of stereo disparity. , 1976, Science.

[14]  D. L. Hintzman Repetition and Memory1 , 1976 .

[15]  V. Mountcastle,et al.  An organizing principle for cerebral function : the unit module and the distributed system , 1978 .

[16]  L. Jacoby On interpreting the effects of repetition: Solving a problem versus remembering a solution , 1978 .

[17]  Dennis H. Klatt,et al.  Software for a cascade/parallel formant synthesizer , 1980 .

[18]  A G Barto,et al.  Toward a modern theory of adaptive networks: expectation and prediction. , 1981, Psychological review.

[19]  Jerome A. Feldman,et al.  Connectionist Models and Their Properties , 1982, Cogn. Sci..

[20]  John R. Anderson Acquisition of cognitive skill. , 1982 .

[21]  Geoffrey E. Hinton,et al.  OPTIMAL PERCEPTUAL INFERENCE , 1983 .

[22]  Robert L. Mercer,et al.  An information theoretic approach to the automatic determination of phonemic baseforms , 1984, ICASSP.

[23]  B A Huberman,et al.  Understanding biological computation: reliable learning and recognition. , 1984, Proceedings of the National Academy of Sciences of the United States of America.

[24]  A G Barto,et al.  Learning by statistical cooperation of self-interested neuron-like computing elements. , 1985, Human neurobiology.

[25]  Geoffrey E. Hinton,et al.  A Learning Algorithm for Boltzmann Machines , 1985, Cogn. Sci..

[26]  Kenneth Ward Church Stress assignment in letter‐to‐sound rules for speech synthesis , 1985 .

[27]  Terrence J. Sejnowski,et al.  Open questions about computation in cerebral cortex , 1986 .

[28]  James L. McClelland,et al.  On learning the past-tenses of English verbs: implicit rules or parallel distributed processing , 1986 .

[29]  S. Kelso,et al.  Hebbian synapses in hippocampus. , 1986, Proceedings of the National Academy of Sciences of the United States of America.

[30]  S Dehaene,et al.  Spin glass model of learning by selection. , 1986, Proceedings of the National Academy of Sciences of the United States of America.

[31]  Jerome A. Feldman,et al.  Neural Representation of Conceptual Knowledge. , 1986 .

[32]  T. D. Harrison,et al.  Boltzmann machines for speech recognition , 1986 .

[33]  Geoffrey E. Hinton,et al.  Learning symmetry groups with hidden units: beyond the perceptron , 1986 .

[34]  G. Lynch,et al.  Synapses, circuits, and the beginnings of memory , 1986 .

[35]  David B. Pisoni,et al.  Text-to-speech: the mitalk system , 1987 .

[36]  David B. Parker,et al.  A comparison of algorithms for neuron-like cells , 1987 .

[37]  Elie Bienenstock,et al.  A neural network for the retrieval of superimposed connection patterns , 1987 .

[38]  Lokendra Shastri,et al.  Learned phonetic discrimination using connectionist networks , 1990, ECST.

[39]  Teuvo Kohonen,et al.  Self-Organization and Associative Memory , 1988 .

[40]  Bernard Widrow,et al.  Adaptive switching circuits , 1988 .

[41]  Terrence J. Sejnowski,et al.  NETtalk: a parallel network that learns to read aloud , 1988 .

[42]  G. Bower,et al.  From conditioning to category learning: an adaptive network model. , 1988 .

[43]  Geoffrey E. Hinton,et al.  Parallel Models of Associative Memory , 1989 .