A Markov model for the acquisition of morphological structure

We describe a new formalism for word morphology. Our model views word generation as a random walk on a trellis of units where each unit is a set of (short) strings. The model naturally incorporates segmentation of words into morphemes. We capture the statistics of unit generation using a probabilistic suffix tree (PST) which is a variant of variable length Markov models. We present an efficient algorithm that learns a PST over the units whose output is a compact stochastic representation of morphological structure. We demonstrate the applicability of our approach by using the model in an allomorphy decision problem.

[1]  Daniel Gildea,et al.  Automatic Induction of Finite State Transducers for Simple Phonological Rules , 1995, ACL.

[2]  Enrique Vidal,et al.  Learning Subsequential Transducers for Pattern Recognition Interpretation Tasks , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[3]  Robert L. Mercer,et al.  Class-Based n-gram Models of Natural Language , 1992, CL.

[4]  Mathias Creutz,et al.  Unsupervised Discovery of Morphemes , 2002, SIGMORPHON.

[5]  Gaja Jarosz,et al.  Unsupervised Learning of Morphology Using a Novel Directed Search Algorithm: Taking the First Step , 2002, SIGMORPHON.

[6]  Ian Cloete,et al.  Automatic Acquisition of Two-Level Morphological Rules , 1997, ANLP.

[7]  Frans M. J. Willems,et al.  The context-tree weighting method: basic properties , 1995, IEEE Trans. Inf. Theory.

[8]  Daniel Jurafsky,et al.  Knowledge-Free Induction of Morphology Using Latent Semantic Analysis , 2000, CoNLL/LLL.

[9]  Eric Gaussier,et al.  Unsupervised learning of derivational morphology from inflectional lexicons , 1999 .

[10]  Sean A. Fulop,et al.  Unsupervised Learning of Morphology Without Morphemes , 2002, SIGMORPHON.

[11]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[12]  Marco Baroni,et al.  Unsupervised discovery of morphologically related words based on orthographic and semantic similarity , 2002, SIGMORPHON.

[13]  John A. Goldsmith,et al.  Unsupervised Learning of the Morphology of a Natural Language , 2001, CL.

[14]  Yoram Singer,et al.  An Efficient Extension to Mixture Techniques for Prediction and Decision Trees , 1997, COLT '97.

[15]  Eric Sven Ristad,et al.  Complexity of morpheme acquisition , 1992, Language Computations.

[16]  Dana Ron,et al.  The power of amnesia: Learning probabilistic automata with variable memory length , 1996, Machine Learning.

[17]  Dana Angluin,et al.  A Note on the Number of Queries Needed to Identify Regular Languages , 1981, Inf. Control..

[18]  L. Karttunen Finite-state Constraints , 1993 .