No Multiplication? No Floating Point? No Problem! Training Networks for Efficient Inference

For successful deployment of deep neural networks on highly--resource-constrained devices (hearing aids, earbuds, wearables), we must simplify the types of operations and the memory/power resources used during inference. Completely avoiding inference-time floating-point operations is one of the simplest ways to design networks for these highly-constrained environments. By discretizing both our in-network non-linearities and our network weights, we can move to simple, compact networks without floating point operations, without multiplications, and avoid all non-linear function computations. Our approach allows us to explore the spectrum of possible networks, ranging from fully continuous versions down to networks with bi-level weights and activations. Our results show that discretization can be done without loss of performance and that we can train a network that will successfully operate without floating-point, without multiplication, and with less RAM on both regression tasks (auto encoding) and multi-class classification tasks (ImageNet). The memory needed to deploy our discretized networks is less than one third of the equivalent architecture that does use floating-point operations.

[1]  Joan Bruna,et al.  Exploiting Linear Structure Within Convolutional Networks for Efficient Evaluation , 2014, NIPS.

[2]  Tapani Raiko,et al.  Techniques for Learning Binary Stochastic Feedforward Neural Networks , 2014, ICLR.

[3]  Ran El-Yaniv,et al.  Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations , 2016, J. Mach. Learn. Res..

[4]  David Minnen,et al.  Variable Rate Image Compression with Recurrent Neural Networks , 2015, ICLR.

[5]  P. Werbos,et al.  Beyond Regression : "New Tools for Prediction and Analysis in the Behavioral Sciences , 1974 .

[6]  Hanan Samet,et al.  Training Quantized Nets: A Deeper Understanding , 2017, NIPS.

[7]  Yoshua Bengio,et al.  Deep Sparse Rectifier Neural Networks , 2011, AISTATS.

[8]  Katherine Bourzac Speck-size computers: Now with deep learning [News] , 2017 .

[9]  Yann LeCun,et al.  The mnist database of handwritten digits , 2005 .

[10]  George D. Magoulas,et al.  Training multilayer networks with discrete activation functions , 2001, IJCNN'01. International Joint Conference on Neural Networks. Proceedings (Cat. No.01CH37222).

[11]  M. Kramer Nonlinear principal component analysis using autoassociative neural networks , 1991 .

[12]  Luca Benini,et al.  Soft-to-Hard Vector Quantization for End-to-End Learned Compression of Images and Neural Networks , 2017, ArXiv.

[13]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[14]  Ali Farhadi,et al.  XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks , 2016, ECCV.

[15]  Yoshua Bengio,et al.  Maxout Networks , 2013, ICML.

[16]  James T. Kwok,et al.  Loss-aware Binarization of Deep Networks , 2016, ICLR.

[17]  J. Jiang,et al.  Image compression with neural networks - A survey , 1999, Signal Process. Image Commun..

[18]  Song Han,et al.  Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.

[19]  David Minnen,et al.  Full Resolution Image Compression with Recurrent Neural Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Pavel Zemcík,et al.  Compression Artifacts Removal Using Convolutional Neural Networks , 2016, J. WSCG.

[21]  Shuchang Zhou,et al.  DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients , 2016, ArXiv.

[22]  A. Krizhevsky Convolutional Deep Belief Networks on CIFAR-10 , 2010 .

[23]  Vincent Vanhoucke,et al.  Improving the speed of neural networks on CPUs , 2011 .

[24]  Haizhou Wang,et al.  Ckmeans.1d.dp: Optimal k-means Clustering in One Dimension by Dynamic Programming , 2011, R J..

[25]  Nicholas D. Lane,et al.  An Early Resource Characterization of Deep Learning on Wearables, Smartphones and Internet-of-Things Devices , 2015, IoT-App@SenSys.

[26]  Anil K. Jain Data clustering: 50 years beyond K-means , 2008, Pattern Recognit. Lett..

[27]  Yee Whye Teh,et al.  The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables , 2016, ICLR.

[28]  Shuang Wu,et al.  Training and Inference with Integers in Deep Neural Networks , 2018, ICLR.

[29]  Alexei A. Efros,et al.  Image-to-Image Translation with Conditional Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Luc Van Gool,et al.  Conditional Probability Models for Deep Image Compression , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[31]  Ruslan Salakhutdinov,et al.  Learning Stochastic Feedforward Neural Networks , 2013, NIPS.

[32]  Paul W. Munro,et al.  Principal Components Analysis Of Images Via Back Propagation , 1988, Other Conferences.

[33]  Lubomir D. Bourdev,et al.  Real-Time Adaptive Image Compression , 2017, ICML.

[34]  Yoshua Bengio,et al.  Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation , 2013, ArXiv.

[35]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[36]  G. Jenks The Data Model Concept in Statistical Mapping , 1967 .

[37]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[38]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[39]  Parul Parashar,et al.  Neural Networks in Machine Learning , 2014 .

[40]  Lei Deng,et al.  Gated XNOR Networks: Deep Neural Networks with Ternary Weights and Activations under a Unified Discretization Framework , 2017, ArXiv.

[41]  Ran El-Yaniv,et al.  Binarized Neural Networks , 2016, NIPS.

[42]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[43]  Valero Laparra,et al.  End-to-end Optimized Image Compression , 2016, ICLR.

[44]  Sachin S. Talathi,et al.  Fixed Point Quantization of Deep Convolutional Networks , 2015, ICML.

[45]  Shumeet Baluja,et al.  Empirical Explorations in Training Networks with Discrete Activations , 2018, ArXiv.

[46]  Teuvo Kohonen,et al.  Learning vector quantization , 1998 .

[47]  Zhou Bin,et al.  A New Learning Algorithm for Neural Networks with Integer Weights and Quantized Non-linear Activation Functions , 2008, IFIP AI.

[48]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.