What is the best multi-stage architecture for object recognition?

In many recent object recognition systems, feature extraction stages are generally composed of a filter bank, a non-linear transformation, and some sort of feature pooling layer. Most systems use only one stage of feature extraction in which the filters are hard-wired, or two stages where the filters in one or both stages are learned in supervised or unsupervised mode. This paper addresses three questions: 1. How does the non-linearities that follow the filter banks influence the recognition accuracy? 2. does learning the filter banks in an unsupervised or supervised manner improve the performance over random filters or hardwired filters? 3. Is there any advantage to using an architecture with two stages of feature extraction, rather than one? We show that using non-linearities that include rectification and local contrast normalization is the single most important ingredient for good accuracy on object recognition benchmarks. We show that two stages of feature extraction yield better accuracy than one. Most surprisingly, we show that a two-stage system with random filters can yield almost 63% recognition rate on Caltech-101, provided that the proper non-linearities and pooling layers are used. Finally, we show that with supervised refinement, the system achieves state-of-the-art performance on NORB dataset (5.6%) and unsupervised pre-training followed by supervised refinement produces good accuracy on Caltech-101 (≫ 65%), and the lowest known error rate on the undistorted, unprocessed MNIST dataset (0.53%).

[1]  Thomas Serre,et al.  Object recognition with features inspired by visual cortex , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[2]  Rajat Raina,et al.  Efficient sparse coding algorithms , 2006, NIPS.

[3]  Manik Varma,et al.  Learning The Discriminative Power-Invariance Trade-Off , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[4]  A Ardalan,et al.  Seroprevalence of hepatitis B in Nahavand, Islamic Republic of Iran. , 2006, Eastern Mediterranean health journal = La revue de sante de la Mediterranee orientale = al-Majallah al-sihhiyah li-sharq al-mutawassit.

[5]  Simon Haykin,et al.  GradientBased Learning Applied to Document Recognition , 2001 .

[6]  P. Tiollais,et al.  Hepatitis B virus. , 1991, Scientific American.

[7]  David G. Lowe,et al.  Multiclass Object Recognition with Sparse, Localized Features , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[8]  J. Cleveland,et al.  Guidelines for infection control in dental health-care settings--2003. , 2003, MMWR. Recommendations and reports : Morbidity and mortality weekly report. Recommendations and reports.

[9]  Jitendra Malik,et al.  Shape matching and object recognition using low distortion correspondences , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[10]  Marc'Aurelio Ranzato,et al.  Semi-supervised learning of compact document representations with deep networks , 2008, ICML '08.

[11]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[12]  H. Margolis,et al.  Hepatitis B: Evolving Epidemiology and Implications for Control , 1991, Seminars in liver disease.

[13]  David J. Field,et al.  Sparse coding with an overcomplete basis set: A strategy employed by V1? , 1997, Vision Research.

[14]  Kazuaki Chayama,et al.  2. Hepatitis B , 2007 .

[15]  Reza Malekzadeh,et al.  HEPATITIS B IN IRAN , 2000 .

[16]  Yann LeCun,et al.  Large-scale Learning with SVM and Convolutional for Generic Object Categorization , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[17]  H. Farzadegan,et al.  Epidemiology of viral hepatitis among Iranian population--a viral marker study. , 1980, Annals of the Academy of Medicine, Singapore.

[18]  Jitendra Malik,et al.  SVM-KNN: Discriminative Nearest Neighbor Classification for Visual Category Recognition , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[19]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[20]  Honglak Lee,et al.  Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations , 2009, ICML '09.

[21]  Jerome I. Tokars,et al.  Recommendations for preventing transmission of infections among chronic hemodialysis patients , 2001 .

[22]  Marc'Aurelio Ranzato,et al.  Unsupervised Learning of Invariant Feature Hierarchies with Applications to Object Recognition , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  Y. LeCun,et al.  Learning methods for generic object recognition with invariance to pose and lighting , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[24]  Reza Malekzadeh,et al.  REASSESSMENT OF THE ROLE OF HEPATITIS B AND C VIRUSES IN POST NECROTIC CIRRHOSIS AND CHRONIC HEPATITIS IN SOUTHERN IRAN , 1999 .

[25]  Michael Elad,et al.  K-SVD and its non-negative variant for dictionary design , 2005, SPIE Optics + Photonics.

[26]  Nicolas Pinto,et al.  Why is Real-World Visual Object Recognition Hard? , 2008, PLoS Comput. Biol..

[27]  Marc'Aurelio Ranzato,et al.  Fast Inference in Sparse Coding Algorithms with Applications to Object Recognition , 2010, ArXiv.

[28]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[29]  E. Jury EASL International Consensus Conference on Hepatitis C , 1999, Journal of hepatology.

[30]  E. Wong Health Care Epidemiology , 2004 .

[31]  M. Alter,et al.  Epidemiology of hepatitis B in Europe and worldwide. , 2003, Journal of hepatology.

[32]  Geoffrey E. Hinton,et al.  Unsupervised learning : foundations of neural computation , 1999 .

[33]  Eero P. Simoncelli,et al.  Nonlinear image representation using divisive normalization , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[34]  Alter Mj,et al.  The epidemiology of viral hepatitis in the United States. , 1994 .

[35]  N. Leung,et al.  Chronic hepatitis B virus infection in Asian countries , 2000, Journal of gastroenterology and hepatology.

[36]  Honglak Lee,et al.  Sparse deep belief net model for visual area V2 , 2007, NIPS.

[37]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[38]  K. Azimi,et al.  CAUSES OF CIRRHOSIS IN A SERIES OF PATIENTS AT A UNIVERSITY HOSPITAL IN TEHRAN , 2002 .

[39]  S. Alavian,et al.  Preliminary report of hepatitis B virus genotype prevalence in Iran. , 2006, World journal of gastroenterology.

[40]  Yihong Gong,et al.  Training Hierarchical Feed-Forward Visual Recognition Models Using Transfer Learning from Pseudo-Tasks , 2008, ECCV.

[41]  Mohammad Reza Zali,et al.  Epidemiology of hepatitis B in the Islamic Republic of Iran , 2021, Eastern Mediterranean Health Journal.

[42]  A. Panlilio,et al.  Updated U.S. Public Health Service guidelines for the management of occupational exposures to HIV and recommendations for postexposure prophylaxis. , 2005, MMWR. Recommendations and reports : Morbidity and mortality weekly report. Recommendations and reports.

[43]  Guillermo Sapiro,et al.  Discriminative learned dictionaries for local image analysis , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[44]  J. X. Zhang,et al.  Natural History and Clinical Consequences of Hepatitis B Virus Infection , 2005, International journal of medical sciences.

[45]  Marc'Aurelio Ranzato,et al.  Efficient Learning of Sparse Representations with an Energy-Based Model , 2006, NIPS.

[46]  H. Margolis,et al.  Strategies to prevent and control hepatitis B and C virus infections: a global perspective. , 1999, Vaccine.

[47]  Yihong Gong,et al.  Linear spatial pyramid matching using sparse coding for image classification , 2009, CVPR.

[48]  Yoshua Bengio,et al.  Greedy Layer-Wise Training of Deep Networks , 2006, NIPS.

[49]  Pietro Perona,et al.  Learning Generative Visual Models from Few Training Examples: An Incremental Bayesian Approach Tested on 101 Object Categories , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[50]  F. André,et al.  Hepatitis B epidemiology in Asia, the Middle East and Africa. , 2000, Vaccine.

[51]  M. Sabri,et al.  Hepatitis B surface antigen and anti-hepatitis C antibodies among blood donors in the Islamic Republic of Iran. , 2000, Eastern Mediterranean health journal = La revue de sante de la Mediterranee orientale = al-Majallah al-sihhiyah li-sharq al-mutawassit.

[52]  R. Fergus,et al.  Learning invariant features through topographic filter maps , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[53]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .