Diversity Networks: Neural Network Compression Using Determinantal Point Processes

We introduce Divnet, a flexible technique for learning networks with diverse neurons. Divnet models neuronal diversity by placing a Determinantal Point Process (DPP) over neurons in a given layer. It uses this DPP to select a subset of diverse neurons and subsequently fuses the redundant neurons into the selected ones. Compared with previous approaches, Divnet offers a more principled, flexible technique for capturing neuronal diversity and thus implicitly enforcing regularization. This enables effective auto-tuning of network architecture and leads to smaller network sizes without hurting performance. Moreover, through its focus on diversity and neuron fusing, Divnet remains compatible with other procedures that seek to reduce memory footprints of networks. We present experimental results to corroborate our claims: for pruning neural networks, Divnet is seen to be notably superior to competing approaches.

[1]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[2]  O. Macchi The coincidence approach to stochastic point processes , 1975, Advances in Applied Probability.

[3]  Misha Denil,et al.  Predicting Parameters in Deep Learning , 2014 .

[4]  Yoshua Bengio,et al.  An empirical evaluation of deep architectures on problems with many factors of variation , 2007, ICML '07.

[5]  Ben Taskar,et al.  Determinantal Point Processes for Machine Learning , 2012, Found. Trends Mach. Learn..

[6]  Y. Peres,et al.  Determinantal Processes and Independence , 2005, math/0503110.

[7]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[8]  Byungkon Kang,et al.  Fast Determinantal Point Process Sampling with Application to Clustering , 2013, NIPS.

[9]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[10]  Le Song,et al.  Deep Fried Convnets , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[11]  Yann LeCun,et al.  Optimal Brain Damage , 1989, NIPS.

[12]  Suvrit Sra,et al.  Efficient Sampling for k-Determinantal Point Processes , 2015, AISTATS.

[13]  Pritish Narayanan,et al.  Deep Learning with Limited Numerical Precision , 2015, ICML.

[14]  Yoshua Bengio,et al.  Low precision arithmetic for deep learning , 2014, ICLR.

[15]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[16]  Ebru Arisoy,et al.  Low-rank matrix factorization for Deep Neural Network training with high-dimensional output targets , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[17]  Babak Hassibi,et al.  Second Order Derivatives for Network Pruning: Optimal Brain Surgeon , 1992, NIPS.

[18]  Ben Taskar,et al.  k-DPPs: Fixed-Size Determinantal Point Processes , 2011, ICML.

[19]  Jasper Snoek,et al.  A Determinantal Point Process Latent Variable Model for Inhibition in Neural Spiking Data , 2013, NIPS.

[20]  Song Han,et al.  Learning both Weights and Connections for Efficient Neural Network , 2015, NIPS.

[21]  R. Venkatesh Babu,et al.  Data-free Parameter Pruning for Deep Neural Networks , 2015, BMVC.

[22]  Ryan P. Adams,et al.  Gradient-based Hyperparameter Optimization through Reversible Learning , 2015, ICML.

[23]  Yixin Chen,et al.  Compressing Neural Networks with the Hashing Trick , 2015, ICML.

[24]  Kai Yu,et al.  Reshaping deep neural network for fast decoding by node-pruning , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).