High Dimensional Classification with Bayesian Neural Networks and Dirichlet Diffusion Trees

Our winning entry in the NIPS 2003 challenge was a hybrid, in which our predictions for the five data sets were made using different methods of classification, or, for the Madelon data set, by averaging the predictions produced using two methods. However, two aspects of our approach were the same for all data sets: We reduced the number of features used for classification to no more than a few hundred, either by selecting a subset of features using simple univariate significance tests, or by performing a global dimensionality reduction using Principal Component Analysis (PCA). We then applied a classification method based on Bayesian learning, using an Automatic Relevance Determination (ARD) prior that allows the model to determine which of these features are most relevant.