Scaling up Multidimensional Recurrent Neural Networks for image segmentation

This work is the final result of the double degree program in association with USI (Università della Svizzera Italiana) and has been conducted in the Swiss AI lab IDSIA (Istituto Dalle Molle di Studi sull’Intelligenza Artificiale). In the machine learning community, interest in neural networks and deep learning have recently seen a continuous growth. In this context Multidimensional Recurrent Neural Networks (MDRNNs) allowed to bring the power of recurrent neural networks (RNNs) to data other than simple sequences and they have the potential to advance the state-of-the-art in many challenging problems, like image segmentation or video classification. Given this interest in MDRNNs, the main contribution of this thesis is an efficient implementation of an MDLSTM layer, which is the Long Short Term Memory (LSTM) extension of MDRNNs. Moreover a simple CPU parallelization of the algorithm is implemented and evaluated against the single core version. We also show that this parallelization achieves speedups comparable to what has been achieved on convolutional neural networks with GPU implementations. Finally the MDSLTM layer is integrated in an hybrid architecture with convolutional layers to perform semantic image segmentation, which is a difficult unsolved problem in computer vision. This architecture achieves near state of the art results on a popular image segmentation benchmark with no need for any pre or post processing.

[1]  Henry J. Kelley,et al.  Gradient Theory of Optimal Flight Paths , 1960 .

[2]  J. Orbach Principles of Neurodynamics. Perceptrons and the Theory of Brain Mechanisms. , 1962 .

[3]  Ya. Z. Tsypkin Applied optimal control: Optimization, estimation, and control: Arthur E. Bryson, Jr. and Yu-Chi Ho: Blaisdell Publishing Company (A Division of Ginn and Company), Waltham, Mass. (1969), 481 pp , 1970 .

[4]  P. Werbos,et al.  Beyond Regression : "New Tools for Prediction and Analysis in the Behavioral Sciences , 1974 .

[5]  Y. Nesterov A method for solving the convex programming problem with convergence rate O(1/k^2) , 1983 .

[6]  Lawrence D. Jackel,et al.  Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.

[7]  P. Dicke,et al.  Feature linking via stimulus-evoked oscillations: experimental results from cat visual cortex and functional implications from a network model , 1989, International 1989 Joint Conference on Neural Networks.

[8]  Paul J. Werbos,et al.  Backpropagation Through Time: What It Does and How to Do It , 1990, Proc. IEEE.

[9]  Antonin Chambolle,et al.  Image Segmentation by Variational Methods: Mumford and Shah Functional and the Discrete Approximations , 1995, SIAM J. Appl. Math..

[10]  Christoph Goller,et al.  Learning task-dependent distributed representations by backpropagation through structure , 1996, Proceedings of International Conference on Neural Networks (ICNN'96).

[11]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[12]  Kuldip K. Paliwal,et al.  Bidirectional recurrent neural networks , 1997, IEEE Trans. Signal Process..

[13]  Barbara Hammer,et al.  On the approximation capability of recurrent neural networks , 2000, Neurocomputing.

[14]  L. Dagum,et al.  OpenMP: an industry standard API for shared-memory programming , 1998 .

[15]  Jürgen Schmidhuber,et al.  Recurrent nets that time and count , 2000, Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks. IJCNN 2000. Neural Computing: New Challenges and Perspectives for the New Millennium.

[16]  Yoshua Bengio,et al.  Gradient Flow in Recurrent Nets: the Difficulty of Learning Long-Term Dependencies , 2001 .

[17]  Pierre Baldi,et al.  The Principled Design of Large-Scale Recursive Neural Network Architectures--DAG-RNNs and the Protein Structure Prediction Problem , 2003, J. Mach. Learn. Res..

[18]  Kunihiko Fukushima,et al.  Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position , 1980, Biological Cybernetics.

[19]  Jürgen Schmidhuber,et al.  Multi-dimensional Recurrent Neural Networks , 2007, ICANN.

[20]  Antonio Torralba,et al.  SIFT Flow: Dense Correspondence across Different Scenes , 2008, ECCV.

[21]  T. Munich,et al.  Offline Handwriting Recognition with Multidimensional Recurrent Neural Networks , 2008, NIPS.

[22]  Nir Friedman,et al.  Probabilistic Graphical Models - Principles and Techniques , 2009 .

[23]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[24]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[25]  Clément Farabet,et al.  Torch7: A Matlab-like Environment for Machine Learning , 2011, NIPS 2011.

[26]  Luca Maria Gambardella,et al.  Flexible, High Performance Convolutional Neural Networks for Image Classification , 2011, IJCAI.

[27]  Yoshua Bengio,et al.  Deep Sparse Rectifier Neural Networks , 2011, AISTATS.

[28]  Vladlen Koltun,et al.  Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials , 2011, NIPS.

[29]  Luca Maria Gambardella,et al.  Deep Neural Networks Segment Neuronal Membranes in Electron Microscopy Images , 2012, NIPS.

[30]  Nitish Srivastava,et al.  Improving neural networks by preventing co-adaptation of feature detectors , 2012, ArXiv.

[31]  Yann LeCun,et al.  Road Scene Segmentation from a Single Image , 2012, ECCV.

[32]  Léon Bottou,et al.  Stochastic Gradient Descent Tricks , 2012, Neural Networks: Tricks of the Trade.

[33]  Camille Couprie,et al.  Learning Hierarchical Features for Scene Labeling , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[34]  Ronan Collobert,et al.  Recurrent Convolutional Neural Networks for Scene Parsing , 2013, ArXiv.

[35]  Svetlana Lazebnik,et al.  Superparsing - Scalable Nonparametric Image Parsing with Superpixels , 2010, International Journal of Computer Vision.

[36]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[37]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[38]  Yann LeCun,et al.  Fast Training of Convolutional Networks through FFTs , 2013, ICLR.

[39]  Jürgen Schmidhuber,et al.  Deep learning in neural networks: An overview , 2014, Neural Networks.

[40]  Jeff Johnson,et al.  Fast Convolutional Nets With fbfft: A GPU Performance Evaluation , 2014, ICLR.

[41]  Yoshua Bengio,et al.  ReNet: A Recurrent Neural Network Based Alternative to Convolutional Networks , 2015, ArXiv.

[42]  Trevor Darrell,et al.  Fully convolutional networks for semantic segmentation , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Marcus Liwicki,et al.  Scene labeling with LSTM recurrent neural networks , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Vibhav Vineet,et al.  Conditional Random Fields as Recurrent Neural Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[45]  Trevor Darrell,et al.  Long-term recurrent convolutional networks for visual recognition and description , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[46]  Alex Graves,et al.  Grid Long Short-Term Memory , 2015, ICLR.

[47]  Jürgen Schmidhuber,et al.  LSTM: A Search Space Odyssey , 2015, IEEE Transactions on Neural Networks and Learning Systems.