Training of Sparsely Connected MLPs

Sparsely connected Multi-Layer Perceptrons (MLPs) differ from conventional MLPs in that only a small fraction of entries in their weight matrices are nonzero. Using sparse matrix-vector multiplication algorithms reduces the computational complexity of classification. Training of sparsely connected MLPs is achieved in two consecutive stages. In the first stage, initial values for the network's parameters are given by the solution to an unsupervised matrix factorization problem, minimizing the reconstruction error. In the second stage, a modified version of the supervised backpropagation algorithm optimizes the MLP's parameters with respect to the classification error. Experiments on the MNIST database of handwritten digits show that the proposed approach achieves equal classification performance compared to a densely connected MLP while speeding-up classification by a factor of seven.

[1]  Geoffrey E. Hinton,et al.  Learning representations by back-propagation errors, nature , 1986 .

[2]  Kanter,et al.  Eigenvalues of covariance matrices: Application to neural-network learning. , 1991, Physical review letters.

[3]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[4]  Marc'Aurelio Ranzato,et al.  Sparse Feature Learning for Deep Belief Networks , 2007, NIPS.

[5]  E. Callaway,et al.  Excitatory cortical neurons form fine-scale functional networks , 2005, Nature.

[6]  Bernhard Schölkopf,et al.  Improving the Accuracy and Speed of Support Vector Machines , 1996, NIPS.

[7]  Harris Drucker,et al.  Comparison of learning algorithms for handwritten digit recognition , 1995 .

[8]  Günther Palm,et al.  Supervised Matrix Factorization with sparseness constraints and fast inference , 2011, The 2011 International Joint Conference on Neural Networks.

[9]  Patrik O. Hoyer,et al.  Non-negative Matrix Factorization with Sparseness Constraints , 2004, J. Mach. Learn. Res..

[10]  Christopher M. Bishop,et al.  Neural networks for pattern recognition , 1995 .

[11]  Luca Maria Gambardella,et al.  Deep, Big, Simple Neural Nets for Handwritten Digit Recognition , 2010, Neural Computation.

[12]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[13]  Fabian J. Theis,et al.  Extended Sparse Nonnegative Matrix Factorization , 2005, IWANN.

[14]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[15]  Toshihisa Tanaka,et al.  First results on uniqueness of sparse non-negative matrix factorization , 2005, 2005 13th European Signal Processing Conference.

[16]  Patrice Y. Simard,et al.  Best practices for convolutional neural networks applied to visual document analysis , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[17]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[18]  Bernhard Schölkopf,et al.  Training Invariant Support Vector Machines , 2002, Machine Learning.

[19]  David L. Elliott,et al.  A Better Activation Function for Artificial Neural Networks , 1993 .

[20]  Steve B. Furber,et al.  Optimal connectivity in hardware-targetted MLP networks , 2009, 2009 International Joint Conference on Neural Networks.

[21]  David J. Field,et al.  What Is the Goal of Sensory Coding? , 1994, Neural Computation.

[22]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[23]  Antonio Cañas,et al.  Towards an Optimal Implementation of MLP in FPGA , 2006, ARC.