Differentiable symbol manipulation and language induction

There is a large class of learning problems that are inherently discrete. Connectionist models that operate on continuous weight and activation spaces are not always appropriate for learning such tasks because their learning procedures have a propensity to produce solutions that lack symbolic interpretation. In this thesis, I demonstrate that it is possible to incorporate differentiable symbol manipulation (DSM) techniques in a connectionist system in order to overcome their nonsymbolic nature. The hypothesis underlying this methodology is that a connectionist model can benefit from exploring a subsymbolic solution space even when the final solutions have symbolic interpretations. In this thesis, the idea of DSM has been applied to the domain of Language Induction. Given a finite number of examples (sentences) from a language, the goal of language induction is to induce the target language by identifying structural regularities underlying the examples. The domain of language induction is strictly symbolic, having to do with rule identification and symbol manipulation, and to learn such tasks using subsymbolic connectionist methods has always been a challenge. I explore three connectionist models using DSM. The first model learns to induce finite state machines (regular languages) by inducing discrete representations of the states. It incorporates an adaptive clustering technique in a standard recurrent connectionist architecture that quantizes the state space as an integral part of learning. Simulations show that this architecture leads to a significant improvement in generalization performance over earlier connectionist approaches. In the second model, I describe an architecture that incorporates DSM for learning symbolic rewrite rules in a subclass of context-free grammars. The third model incorporates DSM for learning yet another subclass of context free languages. In contrast to the second model, it learns the dynamics of a push down automaton. Both the second and third models learn to manipulate symbol strings--a feature that distinguishes them from prior research on similar tasks. While DSM has many strengths, I explore several weaknesses in the current models, most importantly scaling issues. I conclude by discussing directions of future research with the aim of achieving a robust language induction system.