Are Neural Nets Modular? Inspecting Their Functionality Through Differentiable Weight Masks

Neural networks (NNs) whose subnetworks implement reusable functions are expected to offer numerous advantages, e.g., compositionality through efficient recombination of functional building blocks, interpretability, preventing catastrophic interference by separation, etc. Understanding if and how NNs are modular could provide insights into how to improve them. Current inspection methods, however, fail to link modules to their function. We present a novel method based on learning binary weight masks to identify individual weights and subnets responsible for specific functions. This powerful tool shows that typical NNs fail to reuse submodules, becoming redundant instead. It also yields new insights into known generalization issues with the SCAN dataset. Our method also unveils class-specific weights of CNN classifiers, and shows to which extent classifications depend on them. Our findings open many new important directions for future research.

[1]  Stuart J. Russell,et al.  Neural Networks are Surprisingly Modular , 2020, ArXiv.

[2]  James L. McClelland,et al.  Environmental drivers of systematicity and generalization in a situated agent , 2019, ICLR.

[3]  On Network Science and Mutual Information for Explaining Deep Neural Networks , 2019, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[4]  Murray Shanahan,et al.  Reconciling deep learning with symbolic artificial intelligence: representing objects and relations , 2019, Current Opinion in Behavioral Sciences.

[5]  Mathijs Mul,et al.  The compositionality of neural networks: integrating symbolism and connectionism , 2019, ArXiv.

[6]  Ignacio Cases,et al.  Routing Networks and the Challenges of Modular and Compositional Computation , 2019, ArXiv.

[7]  Kyunghyun Cho,et al.  Continual Learning via Neural Pruning , 2019, ArXiv.

[8]  Chihiro Watanabe,et al.  Interpreting Layered Neural Networks via Hierarchical Modular Representation , 2018, ICONIP.

[9]  Aaron C. Courville,et al.  Systematic Generalization: What Is Required and Can It Be Learned? , 2018, ICLR.

[10]  Thomas L. Griffiths,et al.  Automatically Composing Representation Transformations as a Means for Generalization , 2018, ICLR.

[11]  David Barber,et al.  Modular Networks: Learning to Decompose Neural Computation , 2018, NeurIPS.

[12]  Marco Baroni,et al.  Generalization without Systematicity: On the Compositional Skills of Sequence-to-Sequence Recurrent Networks , 2017, ICML.

[13]  Kunio Kashino,et al.  Modular representation of layered neural networks , 2017, Neural Networks.

[14]  Chrisantha Fernando,et al.  PathNet: Evolution Channels Gradient Descent in Super Neural Networks , 2017, ArXiv.

[15]  Razvan Pascanu,et al.  Overcoming catastrophic forgetting in neural networks , 2016, Proceedings of the National Academy of Sciences.

[16]  Yee Whye Teh,et al.  The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables , 2016, ICLR.

[17]  Joshua B. Tenenbaum,et al.  Building machines that learn and think like people , 2016, Behavioral and Brain Sciences.

[18]  Dan Klein,et al.  Neural Module Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[20]  Hod Lipson,et al.  The evolutionary origins of modularity , 2012, Proceedings of the Royal Society B: Biological Sciences.

[21]  Dirk M. Lorenz,et al.  The emergence of modularity in biological systems. , 2011, Physics of life reviews.

[22]  Kim B. Clark,et al.  Design Rules: The Power of Modularity , 2000 .

[23]  G. von Dassow,et al.  Modularity in animal development and evolution: elements of a conceptual framework for EvoDevo. , 1999, The Journal of experimental zoology.

[24]  G. Marcus Rethinking Eliminative Connectionism , 1998, Cognitive Psychology.

[25]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[26]  Michael McCloskey,et al.  Catastrophic Interference in Connectionist Networks: The Sequential Learning Problem , 1989 .

[27]  J. Fodor,et al.  Connectionism and cognitive architecture: A critical analysis , 1988, Cognition.