Bayesian Backpropagation Over I-O Functions Rather Than Weights

The conventional Bayesian justification for backprop is that it finds the MAP weight vector. As this paper shows, to find the MAP i-o function instead, one must add a correction term to backprop. That term biases one towards i-o functions with small description lengths, and in particular favors (some kinds of) feature-selection, pruning, and weight-sharing. This can be viewed as an {\it a priori} argument in favor of those techniques.