Coordination Among Neural Modules Through a Shared Global Workspace

Deep learning has seen a movement away from representing examples with a monolithic hidden state towards a richly structured state. For example, Transformers segment by position, and object-centric architectures decompose images into entities. In all these architectures, interactions between different elements are modeled via pairwise interactions: Transformers make use of self-attention to incorporate information from other positions and object-centric architectures make use of graph neural networks to model interactions among entities. We consider how to improve on pairwise interactions in terms of global coordination and a coherent, integrated representation that can be used for downstream tasks. In cognitive science, a global workspace architecture has been proposed in which functionally specialized components share information through a common, bandwidth-limited communication channel. We explore the use of such a communication channel in the context of deep learning for modeling the structure of complex environments. The proposed method includes a shared workspace through which communication among different specialist modules takes place but due to limits on the communication bandwidth, specialist modules must compete for access. We show that capacity limitations have a rational basis in that (1) they encourage specialization and compositionality and (2) they facilitate the synchronization of otherwise independent specialists.

[1]  Yoshua Bengio,et al.  Inductive biases for deep learning of higher-level cognition , 2020, Proceedings of the Royal Society A.

[2]  Andrew Zisserman,et al.  Perceiver: General Perception with Iterative Attention , 2021, ICML.

[3]  Yoshua Bengio,et al.  Transformers with Competitive Ensembles of Independent Mechanisms , 2021, ArXiv.

[4]  S. Gelly,et al.  An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2020, ICLR.

[5]  Aurko Roy,et al.  Efficient Content-Based Sparse Attention with Routing Transformers , 2020, TACL.

[6]  Sergey Levine,et al.  Recurrent Independent Mechanisms , 2019, ICLR.

[7]  Yoshua Bengio,et al.  S2RMs: Spatially Structured Recurrent Modules , 2020, ICLR.

[8]  Murray Shanahan,et al.  Learning to Combine Top-Down and Bottom-Up Signals in Recurrent Neural Networks with Attention over Modules , 2020, ICML.

[9]  Yoshua Bengio,et al.  Object Files and Schemata: Factorizing Declarative and Procedural Knowledge in Dynamical Systems , 2020, ArXiv.

[10]  Nikolaos Pappas,et al.  Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention , 2020, ICML.

[11]  Han Fang,et al.  Linformer: Self-Attention with Linear Complexity , 2020, ArXiv.

[12]  Mark Chen,et al.  Language Models are Few-Shot Learners , 2020, NeurIPS.

[13]  Arman Cohan,et al.  Longformer: The Long-Document Transformer , 2020, ArXiv.

[14]  Timothy P. Lillicrap,et al.  Compressive Transformers for Long-Range Sequence Modelling , 2019, ICLR.

[15]  D. Ramanan,et al.  CATER: A diagnostic dataset for Compositional Actions and TEmporal Reasoning , 2019, ICLR.

[16]  Jakub W. Pachocki,et al.  Dota 2 with Large Scale Deep Reinforcement Learning , 2019, ArXiv.

[17]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[18]  Wojciech M. Czarnecki,et al.  Grandmaster level in StarCraft II using multi-agent reinforcement learning , 2019, Nature.

[19]  Ignacio Cases,et al.  Routing Networks and the Challenges of Modular and Compositional Computation , 2019, ArXiv.

[20]  Ilya Sutskever,et al.  Generating Long Sequences with Sparse Transformers , 2019, ArXiv.

[21]  Myle Ott,et al.  fairseq: A Fast, Extensible Toolkit for Sequence Modeling , 2019, NAACL.

[22]  Shimon Whiteson,et al.  The StarCraft Multi-Agent Challenge , 2019, AAMAS.

[23]  Yiming Yang,et al.  Transformer-XL: Attentive Language Models beyond a Fixed-Length Context , 2019, ACL.

[24]  Yee Whye Teh,et al.  Set Transformer , 2018, ICML.

[25]  Ilya Sutskever,et al.  Language Models are Unsupervised Multitask Learners , 2019 .

[26]  Christopher Joseph Pal,et al.  Sparse Attentive Backtracking: Temporal CreditAssignment Through Reminding , 2018, NeurIPS.

[27]  Razvan Pascanu,et al.  Relational recurrent neural networks , 2018, NeurIPS.

[28]  Dustin Tran,et al.  Image Transformer , 2018, ICML.

[29]  Jürgen Schmidhuber,et al.  Relational Neural Expectation Maximization: Unsupervised Discovery of Objects and their Interactions , 2018, ICLR.

[30]  Matthew Riemer,et al.  Routing Networks: Adaptive Selection of Non-linear Functions for Multi-Task Learning , 2017, ICLR.

[31]  S. Dehaene,et al.  What is consciousness, and could machines have it? , 2017, Science.

[32]  Tom Schaul,et al.  StarCraft II: A New Challenge for Reinforcement Learning , 2017, ArXiv.

[33]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[34]  Razvan Pascanu,et al.  A simple neural network module for relational reasoning , 2017, NIPS.

[35]  Chrisantha Fernando,et al.  PathNet: Evolution Channels Gradient Descent in Super Neural Networks , 2017, ArXiv.

[36]  Geoffrey E. Hinton,et al.  Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer , 2017, ICLR.

[37]  Sergio Gomez Colmenarejo,et al.  Hybrid computing using a neural network with dynamic external memory , 2016, Nature.

[38]  Dan Klein,et al.  Neural Module Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[40]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[41]  M. Shanahan The brain's connective core and its role in animal cognition , 2012, Philosophical Transactions of the Royal Society B: Biological Sciences.

[42]  J. Changeux,et al.  Experimental and Theoretical Approaches to Conscious Processing , 2011, Neuron.

[43]  Murray Shanahan,et al.  Embodiment and the inner lifeCognition and Consciousness in the Space of Possible Minds , 2010 .

[44]  Stephen M. Omohundro,et al.  Equilateral Triangles: A Challenge for Connectionist Vision , 2009 .

[45]  M. Shanahan A cognitive architecture that combines internal simulation with a global workspace , 2006, Consciousness and Cognition.

[46]  M. Shanahan,et al.  Applying global workspace theory to the frame problem , 2005, Cognition.

[47]  Michael C. Mozer,et al.  Theories of Access Consciousness , 2004, NIPS.

[48]  S Dehaene,et al.  A neuronal model of a global workspace in effortful cognitive tasks. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[49]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[50]  B. Baars IN THE THEATRE OF CONSCIOUSNESS Global Workspace Theory, A Rigorous Scientific Theory of Consciousness. , 1997 .

[51]  Henrik Gollee,et al.  Modular Neural Networks and Self-Decomposition , 1997 .

[52]  G. Reeke The society of mind , 1991 .

[53]  Geoffrey E. Hinton,et al.  Adaptive Mixtures of Local Experts , 1991, Neural Computation.

[54]  R. A. Brooks,et al.  Intelligence without Representation , 1991, Artif. Intell..

[55]  Patrick Gallinari,et al.  A Framework for the Cooperation of Learning Algorithms , 1990, NIPS.

[56]  B. Baars A cognitive theory of consciousness , 1988 .

[57]  V. Braitenberg Vehicles, Experiments in Synthetic Psychology , 1984 .

[58]  J. Fodor Modularity of mind , 1983 .