Control models of natural language parsing

Most recent statistical parsers fall into one of two groups. The largest group consists of parsers which are based on some variation of a probabilistic context-free grammar, use joint probability models, and use tabular methods to find the most probable parse. Parsers in the second group are based on probabilistic push-down automata, use conditional probability models, and use some form of state-space search to find the most probable parse. This thesis is a study of natural language parsing as a control problem. This view leads to parsers of the second type. We show that search can be done very efficiently for such parsers. The control approach leads to a particular interpretation of the history-based parsing tradition, in which history is equated with state. The corresponding probability model is called a Markov parsing model, which can be used both for syntactic disambiguation and for search. The resulting parsers are simple, fast, have excellent coverage, and are reasonably accurate. Using treebanks (collections of text, which are expert-annotated with syntactic structure), we learn controllers for parsers that can be applied with little or no search. We call these greedy or nearly-greedy policies. Thus we are studying parsers which are constrained to operate efficiently.