Training cascaded networks for speeded decisions using a temporal-difference loss

Although deep feedforward neural networks share some characteristics with the primate visual system, a key distinction is their dynamics. Deep nets typically operate in sequential stages wherein each layer fully completes its computation before processing begins in subsequent layers. In contrast, biological systems have cascaded dynamics: information propagates from neurons at all layers in parallel but transmission is gradual over time. In our work, we construct a cascaded ResNet by introducing a propagation delay into each residual block and updating all layers in parallel in a stateful manner. Because information transmitted through skip connections avoids delays, the functional depth of the architecture increases over time and yields a trade off between processing speed and accuracy. We introduce a temporal-difference (TD) training loss that achieves a strictly superior speed-accuracy profile over standard losses. The CASCADEDTD model has intriguing properties, including: typical instances are classified more rapidly than atypical instances; CASCADEDTD is more robust to both persistent and transient noise than is a conventional ResNet; and the time-varying output trace of CASCADEDTD provides a signal that can be used by ‘meta-cognitive’ models for OOD detection and to determine when to terminate processing.

[1]  Nikos Komodakis,et al.  Wide Residual Networks , 2016, BMVC.

[2]  Richard S. Sutton,et al.  Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[3]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[4]  Jonas Kubilius,et al.  Evidence that recurrent circuits are critical to the ventral stream’s execution of core object recognition behavior , 2019, Nature Neuroscience.

[5]  Andrew Y. Ng,et al.  Reading Digits in Natural Images with Unsupervised Feature Learning , 2011 .

[6]  Andrew Zisserman,et al.  Massively Parallel Video Networks , 2018, ECCV.

[7]  Kunihiko Fukushima,et al.  Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position , 1980, Biological Cybernetics.

[8]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[9]  Thomas L. Griffiths,et al.  Human Uncertainty Makes Classification More Robust , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[10]  Nikolaus Kriegeskorte,et al.  Deep neural networks: a new framework for modelling biological vision and brain information processing , 2015, bioRxiv.

[11]  Yoram Singer,et al.  Convolutional Bipartite Attractor Networks , 2019, ArXiv.

[12]  Debadeepta Dey,et al.  Learning Anytime Predictions in Neural Networks via Adaptive Loss Balancing , 2017, AAAI.

[13]  M. Masson Using confidence intervals for graphically based data interpretation. , 2003, Canadian journal of experimental psychology = Revue canadienne de psychologie experimentale.

[14]  Jonathon S. Hare,et al.  Deep Cascade Learning , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[15]  Yoshua Bengio,et al.  On the Properties of Neural Machine Translation: Encoder–Decoder Approaches , 2014, SSST@EMNLP.

[16]  Alexander Maier,et al.  Serial versus parallel processing in mid-level vision: filling-in the details of spatial interpolation , 2015, Neuroscience of consciousness.

[17]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Jan Köhler,et al.  The streaming rollout of deep networks - towards fully model-parallel execution , 2018, NeurIPS.

[19]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[20]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Grace W. Lindsay Convolutional Neural Networks as a Model of the Visual System: Past, Present, and Future , 2020, Journal of Cognitive Neuroscience.

[22]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[23]  D. J. Felleman,et al.  Distributed hierarchical processing in the primate cerebral cortex. , 1991, Cerebral cortex.

[24]  Nikolaus Kriegeskorte,et al.  Recurrent Convolutional Neural Networks: A Better Model of Biological Object Recognition , 2017, bioRxiv.

[25]  Georgios Tzimiropoulos,et al.  How Far are We from Solving the 2D & 3D Face Alignment Problem? (and a Dataset of 230,000 3D Facial Landmarks) , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[26]  A V Herz,et al.  Neural codes: firing rates and beyond. , 1997, Proceedings of the National Academy of Sciences of the United States of America.

[27]  Ziheng Jiang,et al.  Characterizing Structural Regularities of Labeled Data in Overparameterized Models , 2020, ICML.

[28]  Kwang-Cheng Chen,et al.  Information cascades in social networks via dynamic system analyses , 2015, 2015 IEEE International Conference on Communications (ICC).

[29]  James L. McClelland On the time relations of mental processes: An examination of systems of processes in cascade. , 1979 .

[30]  Yinda Zhang,et al.  LSUN: Construction of a Large-scale Image Dataset using Deep Learning with Humans in the Loop , 2015, ArXiv.

[31]  Tudor Dumitras,et al.  Shallow-Deep Networks: Understanding and Mitigating Network Overthinking , 2018, ICML.

[32]  James J. DiCarlo,et al.  Evidence that recurrent circuits are critical to the ventral stream’s execution of core object recognition behavior , 2018, Nature Neuroscience.

[33]  A. Banerjee,et al.  A Simple Model of Herd Behavior , 1992 .

[34]  Jürgen Schmidhuber,et al.  Highway Networks , 2015, ArXiv.

[35]  R. Srikant,et al.  Enhancing The Reliability of Out-of-distribution Image Detection in Neural Networks , 2017, ICLR.

[36]  Jia Deng,et al.  Stacked Hourglass Networks for Human Pose Estimation , 2016, ECCV.