Semi-supervised auto-encoder based on manifold learning

Auto-encoder is a popular representation learning technique which can capture the generative model of data via a encoding and decoding procedure typically driven by reconstruction errors in an unsupervised way. In this paper, we propose a semi-supervised manifold learning based auto-encoder (named semAE). semAE is based on a regularized auto-encoder framework which leverages semi-supervised manifold learning to impose regularization based on the encoded representation. Our proposed approach suits more practical scenarios in which a small number of labeled data are available in addition to a large number of unlabeled data. Experiments are conducted on several well-known benchmarking datasets to validate the efficacy of semAE from the aspects of both representation and classification. The comparisons to state-of-the-art representation learning methods on classification performance in semi-supervised settings demonstrate the superiority of our approach.

[1]  Yoshua Bengio,et al.  What regularized auto-encoders learn from the data-generating distribution , 2012, J. Mach. Learn. Res..

[2]  Geoffrey E. Hinton Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[3]  Pascal Vincent,et al.  The Manifold Tangent Classifier , 2011, NIPS.

[4]  Geoffrey E. Hinton,et al.  Deep Boltzmann Machines , 2009, AISTATS.

[5]  Geoffrey Zweig,et al.  Recent advances in deep learning for speech research at Microsoft , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[6]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Yann LeCun,et al.  Unsupervised Learning of Sparse Features for Scalable Audio Classification , 2011, ISMIR.

[8]  Jason Weston,et al.  Deep learning via semi-supervised embedding , 2008, ICML '08.

[9]  Pascal Vincent,et al.  Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion , 2010, J. Mach. Learn. Res..

[10]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[11]  Aaron C. Courville,et al.  Understanding Representations Learned in Deep Architectures , 2010 .

[12]  Yichuan Tang,et al.  Deep Learning using Linear Support Vector Machines , 2013, 1306.0239.

[13]  J. van Leeuwen,et al.  Neural Networks: Tricks of the Trade , 2002, Lecture Notes in Computer Science.

[14]  Honglak Lee,et al.  An Analysis of Single-Layer Networks in Unsupervised Feature Learning , 2011, AISTATS.

[15]  Christopher K. I. Williams On a Connection between Kernel PCA and Metric Multidimensional Scaling , 2004, Machine Learning.

[16]  Yoshua Bengio,et al.  Greedy Layer-Wise Training of Deep Networks , 2006, NIPS.

[17]  Harm de Vries,et al.  RMSProp and equilibrated adaptive learning rates for non-convex optimization. , 2015 .

[18]  Mikhail Belkin,et al.  Manifold Regularization: A Geometric Framework for Learning from Labeled and Unlabeled Examples , 2006, J. Mach. Learn. Res..

[19]  Yoshua Bengio,et al.  Extracting and composing robust features with denoising autoencoders , 2008, ICML '08.

[20]  Yoshua Bengio,et al.  Implicit Density Estimation by Local Moment Matching to Sample from Auto-Encoders , 2012, ArXiv.

[21]  Pascal Vincent,et al.  Adding noise to the input of a model trained with a regularized objective , 2011, ArXiv.

[22]  Jing J. Liang,et al.  Performance Evaluation of Multiagent Genetic Algorithm , 2006, Natural Computing.

[23]  David A Clausi,et al.  MAGIC: MAp-Guided Ice Classification System , 2010 .

[24]  P.N. Suganthan,et al.  Generalized null space uncorrelated Fisher discriminant analysis for linear dimensionality reduction , 2006, Pattern Recognit..

[25]  E. Parzen On Estimation of a Probability Density Function and Mode , 1962 .

[26]  Thomas Hofmann,et al.  Efficient Learning of Sparse Representations with an Energy-Based Model , 2007 .

[27]  Pascal Vincent,et al.  Contractive Auto-Encoders: Explicit Invariance During Feature Extraction , 2011, ICML.

[28]  Mikhail Belkin,et al.  Laplacian Eigenmaps for Dimensionality Reduction and Data Representation , 2003, Neural Computation.

[29]  Yoshua Bengio,et al.  Semi-supervised Learning by Entropy Minimization , 2004, CAP.

[30]  Dong-Hyun Lee,et al.  Pseudo-Label : The Simple and Efficient Semi-Supervised Learning Method for Deep Neural Networks , 2013 .

[31]  A. Kai Qin,et al.  Rapid and brief communication Uncorrelated heteroscedastic LDAbasedon theweightedpairwise Chernoff criterion , 2004 .

[32]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[33]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[34]  A. Kai Qin,et al.  Enhanced neural gas network for prototype-based clustering , 2005, Pattern Recognit..

[35]  Yoshua Bengio,et al.  Marginalized Denoising Auto-encoders for Nonlinear Representations , 2014, ICML.

[36]  Yoshua. Bengio,et al.  Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[37]  Peter Glöckner,et al.  Why Does Unsupervised Pre-training Help Deep Learning? , 2013 .

[38]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[39]  Marco Wiering,et al.  2011 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN) , 2011, IJCNN 2011.

[40]  Xiaofei He,et al.  Locality Preserving Projections , 2003, NIPS.