Are Neural Nets Modular? Inspecting Functional Modularity Through Differentiable Weight Masks

Neural networks (NNs) whose subnetworks implement reusable functions are expected to offer numerous advantages, including compositionality through efficient recombination of functional building blocks, interpretability, preventing catastrophic interference, etc. Understanding if and how NNs are modular could provide insights into how to improve them. Current inspection methods, however, fail to link modules to their functionality. In this paper, we present a novel method based on learning binary weight masks to identify individual weights and subnets responsible for specific functions. Using this powerful tool, we contribute an extensive study of emerging modularity in NNs that covers several standard architectures and datasets. We demonstrate how common NNs fail to reuse submodules and offer new insights into the related issue of systematic generalization on language tasks.

[1]  Marc'Aurelio Ranzato,et al.  Task-Driven Modular Networks for Zero-Shot Compositional Learning , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[2]  Joelle Pineau,et al.  Conditional Computation in Neural Networks for faster models , 2015, ArXiv.

[3]  Chrisantha Fernando,et al.  PathNet: Evolution Channels Gradient Descent in Super Neural Networks , 2017, ArXiv.

[4]  Avanti Shrikumar,et al.  Learning Important Features Through Propagating Activation Differences , 2017, ICML.

[5]  Kim B. Clark,et al.  Design Rules: The Power of Modularity , 2000 .

[6]  J. Fodor,et al.  Connectionism and cognitive architecture: A critical analysis , 1988, Cognition.

[7]  Jason Yosinski,et al.  Deconstructing Lottery Tickets: Zeros, Signs, and the Supermask , 2019, NeurIPS.

[8]  Jeffrey L. Krichmar,et al.  Attention-Based Structural-Plasticity , 2019, ArXiv.

[9]  Dirk M. Lorenz,et al.  The emergence of modularity in biological systems. , 2011, Physics of life reviews.

[10]  Chihiro Watanabe,et al.  Interpreting Layered Neural Networks via Hierarchical Modular Representation , 2018, ICONIP.

[11]  Andrew Zisserman,et al.  Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps , 2013, ICLR.

[12]  Murray Shanahan,et al.  Learning to Combine Top-Down and Bottom-Up Signals in Recurrent Neural Networks with Attention over Modules , 2020, ICML.

[13]  G. Marcus Rethinking Eliminative Connectionism , 1998, Cognitive Psychology.

[14]  Svetlana Lazebnik,et al.  PackNet: Adding Multiple Tasks to a Single Network by Iterative Pruning , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[15]  Murray Shanahan,et al.  Reconciling deep learning with symbolic artificial intelligence: representing objects and relations , 2019, Current Opinion in Behavioral Sciences.

[16]  Parul Parashar,et al.  Neural Networks in Machine Learning , 2014 .

[17]  Ankur Taly,et al.  Axiomatic Attribution for Deep Networks , 2017, ICML.

[18]  Thomas Brox,et al.  Striving for Simplicity: The All Convolutional Net , 2014, ICLR.

[19]  Jacob Andreas,et al.  Measuring Compositionality in Representation Learning , 2019, ICLR.

[20]  Adam Gaier,et al.  Weight Agnostic Neural Networks , 2019, NeurIPS.

[21]  Lukasz Kaiser,et al.  Universal Transformers , 2018, ICLR.

[22]  Yi Wu,et al.  Multi-Task Reinforcement Learning with Soft Modularization , 2020, NeurIPS.

[23]  G. von Dassow,et al.  Modularity in animal development and evolution: elements of a conceptual framework for EvoDevo. , 1999, The Journal of experimental zoology.

[24]  Babak Hassibi,et al.  Second Order Derivatives for Network Pruning: Optimal Brain Surgeon , 1992, NIPS.

[25]  Pushmeet Kohli,et al.  Analysing Mathematical Reasoning Abilities of Neural Models , 2019, ICLR.

[26]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[27]  Stuart J. Russell,et al.  Neural Networks are Surprisingly Modular , 2020, ArXiv.

[28]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[29]  Yoshua Bengio,et al.  Object Files and Schemata: Factorizing Declarative and Procedural Knowledge in Dynamical Systems , 2020, ArXiv.

[30]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Stuart J. Russell,et al.  Pruned Neural Networks are Surprisingly Modular , 2020, 2003.04881.

[32]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[33]  Aaron C. Courville,et al.  Systematic Generalization: What Is Required and Can It Be Learned? , 2018, ICLR.

[34]  Mathijs Mul,et al.  Compositionality Decomposed: How do Neural Networks Generalise? , 2019, J. Artif. Intell. Res..

[35]  Yee Whye Teh,et al.  The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables , 2016, ICLR.

[36]  Kartikeya Bhardwaj,et al.  On Network Science and Mutual Information for Explaining Deep Neural Networks , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[37]  Richard F. Lyon,et al.  Neural Networks for Machine Learning , 2017 .

[38]  Yann LeCun,et al.  Optimal Brain Damage , 1989, NIPS.

[39]  Dana H. Ballard,et al.  Modular Learning in Neural Networks , 1987, AAAI.

[40]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[41]  Kunio Kashino,et al.  Modular representation of layered neural networks , 2017, Neural Networks.

[42]  Thomas L. Griffiths,et al.  Automatically Composing Representation Transformations as a Means for Generalization , 2018, ICLR.

[43]  Felix Hill,et al.  Measuring abstract reasoning in neural networks , 2018, ICML.

[44]  Hod Lipson,et al.  The evolutionary origins of modularity , 2012, Proceedings of the Royal Society B: Biological Sciences.

[45]  Sergey Levine,et al.  Recurrent Independent Mechanisms , 2019, ICLR.

[46]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[47]  Kyunghyun Cho,et al.  Continual Learning via Neural Pruning , 2019, ArXiv.

[48]  Hanan Samet,et al.  Pruning Filters for Efficient ConvNets , 2016, ICLR.

[49]  C. Malsburg Self-organization of orientation sensitive cells in the striate cortex , 2004, Kybernetik.

[50]  Ignacio Cases,et al.  Routing Networks and the Challenges of Modular and Compositional Computation , 2019, ArXiv.

[51]  Razvan Pascanu,et al.  Overcoming catastrophic forgetting in neural networks , 2016, Proceedings of the National Academy of Sciences.

[52]  Michael McCloskey,et al.  Catastrophic Interference in Connectionist Networks: The Sequential Learning Problem , 1989 .

[53]  Yoshua Bengio,et al.  Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation , 2013, ArXiv.

[54]  Michael Carbin,et al.  The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks , 2018, ICLR.

[55]  David Barber,et al.  Modular Networks: Learning to Decompose Neural Computation , 2018, NeurIPS.

[56]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[57]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[58]  Svetlana Lazebnik,et al.  Piggyback: Adapting a Single Network to Multiple Tasks by Learning to Mask Weights , 2018, ECCV.

[59]  Dan Klein,et al.  Neural Module Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[60]  Marco Baroni,et al.  Generalization without Systematicity: On the Compositional Skills of Sequence-to-Sequence Recurrent Networks , 2017, ICML.