Variational inference in graphical models: The view from the marginal polytope

Underlying a variety of techniques for approximate inference—among them mean field, sum-product, and cluster variational methods—is a classical variational principle from statistical physics, which involves a “free energy” optimization problem over the set of all distributions. Working within the framework of exponential families, we describe an alternative view, in which the optimization takes place over the (typically) much lowerdimensional space of mean parameters. The associated constraint set consists of all mean parameters that are globally realizable; for discrete random variables, we refer to this set as a marginal polytope. As opposed to the classical formulation, the representation given here clarifies that there are two distinct components to variational inference algorithms: (a) an approximation to the entropy function; and (b) an approximation to the marginal polytope. This viewpoint clarifies the essential ingredients of known variational methods, and also suggests novel relaxations. Taking the “zero-temperature limit” recovers a variational representation for MAP computation as a linear program (LP) over the marginal polytope. For trees, the max-product updates are a dual method for solving this LP, which provides a variational viewpoint that unifies the sum-product and max-product algorithms.