Learning optimality-theoretic grammars☆

We present evidence that Optimality Theory's account of Universal Grammar has manifold implications for learning. The general principles of Optimality Theory (OT; Prince and Smolensky, 1993) are reviewed and illustrated with Grimshaw and Samek-Lodovici's (1995) OT theory of clausal subjects. The optimization structure OT provides grammar is used to derive a principled decomposition of the learning problem into the problem of assigning hidden structure to primary learning data and the problem of learning the grammar governing that hidden structure. Methods are proposed for analyzing both sub-problems, and their combination is illustrated for the problem of learning a stress system from data lacking metrical constituent boundaries. We present general theorems showing that the proposed solution to the grammar learning sub-problem exploits the special structure imposed by OT on the space of human grammars to correctly and efficiently home in on a target grammar.

[1]  Bruce Tesar,et al.  Robust Interpretive Parsing in Metrical Stress Theory , 1998 .

[2]  J. Grimshaw Projection, heads, and optimality , 1997 .

[3]  P. Smolensky,et al.  When is less more? Faithfulness and minimal links in wh-chains , 1998 .

[4]  Eric Baković Optimality and inversion in Spanish , 1998 .

[5]  Ken Safir,et al.  Comments on Wexler and Manzini , 1987 .

[6]  R. Shillcock,et al.  Proceedings of the Twenty-Sixth Annual Conference of the Cognitive Science Society , 1998 .

[7]  Bruce Tesar,et al.  Computing Optimal Forms in Optimality Theory: Basic Syllabification ; CU-CS-763-95 , 2008 .

[8]  Lauri Karttunen,et al.  The Proper Treatment of Optimality in Computational Phonology , 1998, ArXiv.

[9]  Giorgio Satta,et al.  Optimality Theory and the Generative Complexity of Constraint Violability , 1998, CL.

[10]  T. Mark Ellison,et al.  Phonological Derivation in Optimality Theory , 1994, COLING.

[11]  Bruce Tesar,et al.  Computing Optimal Descriptions for Optimality Theory Grammars with Context-Free Position Structures , 1996, ACL.

[12]  Bruce Tesar,et al.  An iterative strategy for language learning , 1998 .

[13]  Jason Eisner Efficient Generation in Primitive Optimality Theory , 1997, ACL.

[14]  Vieri Samek-Lodovici,et al.  Constraints on subjects : an optimality theoretic analysis , 1996 .

[15]  John Cocke,et al.  A Statistical Approach to Machine Translation , 1990, CL.

[16]  Partha Niyogi,et al.  Formalizing Triggers: A Learning Model for Finite Spaces , 1993 .

[17]  Paul Smolensky,et al.  Schema Selection and Stochastic Inference in Modular Environments , 1983, AAAI.

[18]  Katherine Demuth,et al.  Markedness and the Development of Prosodic Structure , 1995 .

[19]  M. Rita Manzini,et al.  Parameters and Learnability in Binding Theory , 1987 .

[20]  L. Baum,et al.  Statistical Inference for Probabilistic Functions of Finite State Markov Chains , 1966 .

[21]  Michael C. Mozer,et al.  Mathematical Perspectives on Neural Networks , 1996 .

[22]  Alan Prince,et al.  Prosodic morphology : constraint interaction and satisfaction , 1993 .

[23]  Michael Hammond,et al.  Parsing in OT , 1997 .

[24]  Y. Miyata,et al.  Harmonic grammar: A formal multi-level connectionist theory of linguistic well-formedness: Theoretic , 1990 .

[25]  P. Smolensky,et al.  The Learnability of Optimality Theory: An Algorithm and Some Basic Complexity Results , 1993 .

[26]  Bruce Tesar,et al.  Computational optimality theory , 1996 .

[27]  Geoffrey E. Hinton Connectionist Learning Procedures , 1989, Artif. Intell..

[28]  William J. Turkel,et al.  The Logical Problem of Language Acquisition in Optimality Theory , 1998 .

[29]  P. Smolensky On the comprehension/production dilemma in child language , 1996 .

[30]  Alan S. Prince,et al.  Faithfulness and reduplicative identity , 1995 .

[31]  S. Kapur,et al.  On the use of triggers in parameter setting , 1996 .

[32]  Pilar Barbosa,et al.  Is the best good enough? : optimality and competition in syntax , 1998 .

[33]  Clara C. Levelt,et al.  Syllable Types in Cross-linguistic and Developmental Grammars. , 1998 .

[34]  B. Dresher,et al.  A computational learning model for metrical phonology , 1990, Cognition.

[35]  Lalit R. Bahl,et al.  A Maximum Likelihood Approach to Continuous Speech Recognition , 1983, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[36]  Bruce Tesar,et al.  An Iterative Strategy for Learning Metrical Stress in Optimality Theory , 1996 .

[37]  Paul Smolensky,et al.  Information processing in dynamical systems: foundations of harmony theory , 1986 .