论文信息 - Learning Deep Parsimonious Representations

Learning Deep Parsimonious Representations

In this paper we aim at facilitating generalization for deep networks while supporting interpretability of the learned representations. Towards this goal, we propose a clustering based regularization that encourages parsimonious representations. Our k-means style objective is easy to optimize and flexible supporting various forms of clustering, including sample and spatial clustering as well as co-clustering. We demonstrate the effectiveness of our approach on the tasks of unsupervised learning, classification, fine grained categorization and zero-shot learning.

[1] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[2] Tara N. Sainath,et al. FUNDAMENTAL TECHNOLOGIES IN MODERN SPEECH RECOGNITION Digital Object Identifier 10.1109/MSP.2012.2205597 , 2012 .

[3] Babak Hassibi,et al. Second Order Derivatives for Network Pruning: Optimal Brain Surgeon , 1992, NIPS.

[4] Trevor Darrell,et al. Part-Based R-CNNs for Fine-Grained Category Detection , 2014, ECCV.

[5] Yann LeCun,et al. Regularization of Neural Networks using DropConnect , 2013, ICML.

[6] Yann LeCun,et al. Optimal Brain Damage , 1989, NIPS.

[7] Geoffrey E. Hinton,et al. Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[8] Bernt Schiele,et al. Evaluation of output embeddings for fine-grained image classification , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9] Dhruv Batra,et al. Joint Unsupervised Learning of Deep Representations and Image Clusters , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10] Ross B. Girshick,et al. Reducing Overfitting in Deep Networks by Decorrelating Representations , 2015, ICLR.

[11] Jiayu Zhou,et al. Learning A Task-Specific Deep Architecture For Clustering , 2015, SDM.

[12] Quoc V. Le,et al. Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[13] Lorien Y. Pratt,et al. Comparing Biases for Minimal Network Construction with Back-Propagation , 1988, NIPS.

[14] Sergei Vassilvitskii,et al. k-means++: the advantages of careful seeding , 2007, SODA '07.

[15] George Trigeorgis,et al. A Deep Semi-NMF Model for Learning Hidden Representations , 2014, ICML.

[16] A. Tikhonov. On the stability of inverse problems , 1943 .

[17] Yoshua Bengio,et al. Maxout Networks , 2013, ICML.

[18] Joos Vandewalle,et al. A Multilinear Singular Value Decomposition , 2000, SIAM J. Matrix Anal. Appl..

[19] Christopher D. Manning,et al. Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[20] Guigang Zhang,et al. Deep Learning , 2016, Int. J. Semantic Comput..

[21] Nitish Srivastava,et al. Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[22] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .

[23] Yixin Chen,et al. Compressing Neural Networks with the Hashing Trick , 2015, ICML.

[24] R. Tibshirani. Regression Shrinkage and Selection via the Lasso , 1996 .

[25] Gang Chen,et al. Deep Learning with Nonparametric Clustering , 2015, ArXiv.

[26] Cordelia Schmid,et al. Label-Embedding for Attribute-Based Classification , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[27] Song Han,et al. Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.

[28] Pietro Perona,et al. Caltech-UCSD Birds 200 , 2010 .

[29] Zhuo Chen,et al. Deep clustering: Discriminative embeddings for segmentation and separation , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[30] Enhong Chen,et al. Learning Deep Representations for Graph Clustering , 2014, AAAI.

[31] Trevor Darrell,et al. DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition , 2013, ICML.

[32] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[33] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[34] Trevor Darrell,et al. Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[35] Jitendra Malik,et al. Simultaneous Detection and Segmentation , 2014, ECCV.