Energy-Based Factor Graphs for Prediction in Relational Data

A majority of supervised learning algorithms process an input point independently of other points, based on the assumptions that the input data is sampled independently and identically from a fixed underlying distribution. However in a number of real-world problems the data inherently possesses a relational structure. The value of variables associated with each sample not only depends on features specific to that sample, but also on the features and variables of other related samples. Prices of real estate properties possess such a relational structure. Price of a house is a function of features that are specific to that house, such as the number of bedrooms, etc. In addition it is also influenced by features of the neighborhood in which it lies, some of which are measurable, such as the quality of the local schools. However most of the features that make a particular neighborhood desirable are very difficult to measure directly, and are merely reflected in the price of houses in that neighborhood. Hence the “desirability” of a location/neighborhood can be modeled as a latent variable, that must be estimated as part of the learning process, and efficiently inferred for unseen samples. A number of authors have recently proposed architectures and learning algorithms that make use of relational structure. The earlier techniques were based on the idea of influence propagation [1, 3, 6, 5]. Probabilistic Relational Models (PRMs) were introduced in [4, 2] as an extension of Bayesian networks to relational data. Their discriminative extensions, called Relational Markov Networks (RMN) were later proposed in [7]. This paper introduces a general framework for prediction in relational data called Energy-Based Relational Factor Graphs (EBRFG). An architecture is presented that allows efficient inference algorithms for continuous variables with relational dependencies. The class of models introduced is novel in several ways: 1. it pertains to relational regression problems in which the answer variable is continuous; 2. it allows inter-sample dependencies through hidden variables (which may also be continuous), as well as through the answer variable; 3. it allows log-likelihood functions that are non-linear in the parameters, which leads to non-convex loss functions but are considerably more flexible; 4. it allows the use the non-parametric models for the relational factors, while providing an efficient inference algorithm for new samples; 5. it eliminates the intractable partition function problem through appropriate design of the relational and non-relational factors. The idea behind a relational factor graph is to have a single graph that models the entire collection of data samples. The relationships between samples is captured by the factors that connect the variables associated with multiple samples. We are given a set of N training samples, each of which is described by a sample-specific feature vector X and an answer to be predicted Y . Let the collection of input variables be denoted by X = {X, i = 1 . . . N}, the output variables by Y = {Y , i = 1 . . . N}, and the latent variables by Z. The EBRFG is defined by an energy function of the form E(W,Z,Y,X) = E(W,Z, Y , . . . , Y N , X, . . . , X ), in which W is the set of parameters to be estimated by learning. Given a test sample feature vector X, the model is used to predict the value of the corresponding answer variable Y . One way to do this is by minimizing the following energy function augmented with the test sample (X, Y ) Y 0∗ = argminY 0{min Z E(W,Z, Y , . . . , Y N , X, . . . , X )}. (1)