Dynamic On-line Clustering and State Extraction: An Approach to Symbolic Learning

Although recurrent neural nets have been moderately successful in learning to emulate finite-state machines (FSMs), the continuous internal state dynamics of a neural net are not well matched to the discrete behavior of an FSM. We describe an architecture, called DOLCE, that allows discrete states to evolve in a net as learning progresses. DOLCE consists of a standard recurrent neural net trained by gradient descent and an adaptive clustering technique that quantizes the state space. We describe two implementations of DOLCE. The first implementation, called DOLCE(u), uses an adaptive clustering scheme in an unsupervised mode to determine both the number of clusters and the partitioning of the state space as learning progresses. The second model, DOLCE(s), uses a Gaussian Mixture Model in a supervised learning framework to infer the states of an FSM. DOLCE(s) is based on the assumption that a finite set of discrete internal states is required for the task, and that the actual network state belongs to this set but has been corrupted by noise due to inaccuracy in the weights. DOLCE(s) learns to recover the discrete state with maximum a posteriori probability from the noisy state. Simulations show that both implementations of DOLCE lead to a significant improvement in generalization performance over earlier neural net approaches to FSM induction. The idea of adaptive quantization is not just applicable to DOLCE but can be applied to other domains as well.

[1]  C. Lee Giles,et al.  Learning and Extracting Finite State Automata with Second-Order Recurrent Neural Networks , 1992, Neural Computation.

[2]  Michael C. Mozer,et al.  Discovering the Structure of a Reactive Environment by Exploration , 1990, Neural Computation.

[3]  Hinrich Schütze,et al.  Word Space , 1992, NIPS.

[4]  E. Forgy Cluster analysis of multivariate data : efficiency versus interpretability of classifications , 1965 .

[5]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[6]  James L. McClelland,et al.  Graded state machines: the representation of temporal contingencies in feedback networks , 1995 .

[7]  Michael C. Mozer,et al.  Rule Induction through Integrated Symbolic and Subsymbolic Processing , 1991, NIPS.

[8]  Raymond L. Watrous,et al.  Induction of Finite-State Languages Using Second-Order Recurrent Networks , 1992, Neural Computation.

[9]  Jeffrey L. Elman,et al.  Finding Structure in Time , 1990, Cogn. Sci..

[10]  Geoffrey E. Hinton,et al.  Simplifying Neural Networks by Soft Weight-Sharing , 1992, Neural Computation.

[11]  Jude W. Shavlik,et al.  Interpretation of Artificial Neural Networks: Mapping Knowledge-Based Neural Networks into Rules , 1991, NIPS.

[12]  Michael C. Mozer,et al.  A Connectionist Symbol Manipulator that Discovers the Structure of Context-Free Languages , 1992, NIPS.

[13]  Geoffrey H. Ball,et al.  ISODATA, A NOVEL METHOD OF DATA ANALYSIS AND PATTERN CLASSIFICATION , 1965 .

[14]  Padhraic Smyth,et al.  Learning Finite State Machines With Self-Clustering Recurrent Networks , 1993, Neural Computation.