Bayesian backfitting for high dimensional regression

Whenever a graphical model contains connections from multiple nodes to a single node, statistical inference of model parameters may require the evaluation and possibly the inversion of the covariance matrix of all variables contributing to such a fan-in, particularly in the context of regression and classification. Thus, for high dimensional fanins, statistical inference can become computationally rather expensive and numerically brittle. In this paper, we propose an EM-based estimation method that statistically decouples the inputs by the introduction of hidden variables in each branch of the fan-in. As a result, the algorithm has a per-iteration complexity that is only linear in the order of the fanin. Interestingly, the resulting algorithm can be interpreted as a probabilistic version of backfitting, and consequently, is ideally suited for applications of backfitting that require to cleanly propagate probabilities, as in Bayesian inference. We demonstrate the effectiveness of Bayesian Backfitting in dealing with extremely high-dimensional, underconstrained regression problems. In addition we highlight its connection to probabilistic partial least squares regression, and its extensions to nonlinear datasets through variational Bayesian mixture of experts regression, and nonparametric locally weighted learning.