A new algorithm for training sparse autoencoders

Data representation plays an important role in performance of machine learning algorithms. Since data usually lacks the desired quality, many efforts have been made to provide a more desirable representation of data. Among many different approaches, sparse data representation has gained popularity in recent years. In this paper, we propose a new sparse autoencoder by imposing the power two of smoothed L0 norm of data representation on the hidden layer of regular autoencoder. The square of smoothed L0 norm increases the tendency that each data representation is "individually" sparse. Moreover, by using the proposed sparse autoencoder, once the model parameters are learned, the sparse representation of any new data is obtained simply by a matrix-vector multiplication without performing any optimization. When applied to the MNIST, CIFAR-10, and OPTDIGITS datasets, the results show that the proposed model guarantees a sparse representation for each input data which leads to better classification results.

[1]  Michael Elad,et al.  Image Denoising Via Sparse and Redundant Representations Over Learned Dictionaries , 2006, IEEE Transactions on Image Processing.

[2]  Balas K. Natarajan,et al.  Sparse Approximate Solutions to Linear Systems , 1995, SIAM J. Comput..

[3]  Patrik O. Hoyer,et al.  Non-negative Matrix Factorization with Sparseness Constraints , 2004, J. Mach. Learn. Res..

[4]  Honglak Lee,et al.  Sparse deep belief net model for visual area V2 , 2007, NIPS.

[5]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[6]  Jochen J. Steil,et al.  Efficient online learning of a non-negative sparse autoencoder , 2010, ESANN.

[7]  Yihong Gong,et al.  Linear spatial pyramid matching using sparse coding for image classification , 2009, CVPR.

[8]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[9]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[10]  Christian Jutten,et al.  A Fast Approach for Overcomplete Sparse Decomposition Based on Smoothed $\ell ^{0}$ Norm , 2008, IEEE Transactions on Signal Processing.

[11]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[12]  Cordelia Schmid,et al.  A Performance Evaluation of Local Descriptors , 2005, IEEE Trans. Pattern Anal. Mach. Intell..

[13]  Quoc V. Le,et al.  Learning hierarchical invariant spatio-temporal features for action recognition with independent subspace analysis , 2011, CVPR 2011.

[14]  Yoshua. Bengio,et al.  Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[15]  Michael S. Lewicki,et al.  A Theoretical Analysis of Robust Coding over Noisy Overcomplete Channels , 2005, NIPS.

[16]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[17]  Jürgen Schmidhuber,et al.  Stacked Convolutional Auto-Encoders for Hierarchical Feature Extraction , 2011, ICANN.

[18]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[20]  Nan Jiang,et al.  An empirical analysis of different sparse penalties for autoencoder in unsupervised feature learning , 2015, 2015 International Joint Conference on Neural Networks (IJCNN).

[21]  Christian Jutten,et al.  Dictionary learning for sparse decomposition: A new criterion and algorithm , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[22]  Marc'Aurelio Ranzato,et al.  Fast Inference in Sparse Coding Algorithms with Applications to Object Recognition , 2010, ArXiv.

[23]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[24]  Luc Van Gool,et al.  SURF: Speeded Up Robust Features , 2006, ECCV.

[25]  Rajat Raina,et al.  Efficient sparse coding algorithms , 2006, NIPS.

[26]  David J. Field,et al.  Sparse coding with an overcomplete basis set: A strategy employed by V1? , 1997, Vision Research.

[27]  Marc'Aurelio Ranzato,et al.  Sparse Feature Learning for Deep Belief Networks , 2007, NIPS.

[28]  Marc'Aurelio Ranzato,et al.  Efficient Learning of Sparse Representations with an Energy-Based Model , 2006, NIPS.

[29]  Cordelia Schmid,et al.  Semi-Local Affine Parts for Object Recognition , 2004, BMVC.