Fast and Flexible Inference of Joint Distributions from their Marginals

Across the social sciences and elsewhere, practitioners frequently have to reason about relationships between random variables, despite lacking joint observations of the variables. This is sometimes called an “ecological” inference; given samples from the marginal distributions of the variables, one attempts to infer their joint distribution. The problem is inherently ill-posed, yet only a few models have been proposed for bringing prior information into the problem, often relying on restrictive or unrealistic assumptions and lacking a unified approach. In this paper, we treat the inference problem generally and propose a unified class of models that encompasses some of those previously proposed while including many new ones. Previous work has relied on either relaxation or approximate inference via MCMC, with the latter known to mix prohibitively slowly for this type of problem. Here we instead give a single exact inference algorithm that works for the entire model class via an efficient fixed point iteration called Dykstra’s method. We investigate empirically both the computational cost of our algorithm and the accuracy of the new models on real datasets, showing favorable performance in both cases and illustrating the impact of increased flexibility in modeling enabled by this work.

[1]  F. Wilcoxon Individual Comparisons by Ranking Methods , 1945 .

[2]  L. A. Goodman Ecological Regressions and Behavior of Individuals , 1953 .

[3]  M. Sklar Fonctions de repartition a n dimensions et leurs marges , 1959 .

[4]  L. Bregman The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming , 1967 .

[5]  Richard Sinkhorn,et al.  Concerning nonnegative matrices and doubly stochastic matrices , 1967 .

[6]  J. M. Kousser Ecological Regression and the Analysis of Past Politics , 1973 .

[7]  D. F. Hawkins,et al.  Social Science in the Courtroom: Statistical Techniques and Research Methods for Winning Class-Action Suits. , 1983 .

[8]  B. Grofman,et al.  The “Totality of Circumstances Test” in Section 2 of the 1982 Extension of the Voting Rights Act: A Social Science Perspective* , 1985 .

[9]  Paul Kleppner Chicago Divided: The Making of a Black Mayor , 1985 .

[10]  H. Morgenstern,et al.  Ecologic studies in epidemiology: concepts, principles, and methods. , 1995, Annual review of public health.

[11]  D. Freedman,et al.  A solution to the ecological inference problem , 1997 .

[12]  Heinz H. Bauschke,et al.  Legendre functions and the method of random Bregman projections , 1997 .

[13]  John D. Lafferty,et al.  Inducing Features of Random Fields , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[14]  R. Nelsen An Introduction to Copulas , 1998 .

[15]  Heinz H. Bauschke,et al.  Dykstras algorithm with bregman projections: A convergence proof , 2000 .

[16]  M. Tanner,et al.  Bayesian and Frequentist Inference for Ecological Inference: The R×C Case , 2001 .

[17]  C. Villani Topics in Optimal Transportation , 2003 .

[18]  J. Wakefield Ecological inference for 2 × 2 tables (with discussion) , 2004 .

[19]  M. Tanner,et al.  Ecological Inference: New Methodological Strategies , 2004 .

[20]  J. Forster Ecological inference for 2 × 2 tables - Discussion , 2004 .

[21]  Hugh P Possingham,et al.  Zero tolerance ecology: improving ecological inference by modelling the source of zero observations. , 2005, Ecology letters.

[22]  R. Johnston,et al.  Putting Voters in Their Place , 2006 .

[23]  Sylvia Richardson,et al.  Improving ecological inference using individual‐level data , 2006, Statistics in medicine.

[24]  Miroslav Dudík,et al.  Maximum Entropy Distribution Estimation with Generalized Regularization , 2006, COLT.

[25]  Inderjit S. Dhillon,et al.  Matrix Nearness Problems with Bregman Divergences , 2007, SIAM J. Matrix Anal. Appl..

[26]  D. Greiner Ecological Inference in Voting Rights Act Disputes: Where are We Now, and Where Do We Want to Be? , 2007 .

[27]  James Honaker Unemployment and Violence in Northern Ireland: a missing data model for ecological inference. 1 , 2008 .

[28]  J. Wakefield Ecologic studies revisited. , 2008, Annual review of public health.

[29]  Etienne Piguet,et al.  Linking climate change, environmental degradation, and migration: a methodological overview , 2010 .

[30]  Kevin M. Quinn,et al.  Exit Polling and Racial Bloc Voting: Combining Individual-Level and R X C Ecological Data , 2010, 1101.0985.

[31]  Hariharan Narayanan,et al.  Random Walks on Polytopes and an Affine Interior Point Method for Linear Programming , 2012, Math. Oper. Res..

[32]  Marco Cuturi,et al.  Sinkhorn Distances: Lightspeed Computation of Optimal Transport , 2013, NIPS.

[33]  Gabriel Peyré,et al.  Entropic Approximation of Wasserstein Gradient Flows , 2015, SIAM J. Imaging Sci..

[34]  Gabriel Peyré,et al.  Iterative Bregman Projections for Regularized Transportation Problems , 2014, SIAM J. Sci. Comput..

[35]  Gabriel Peyré,et al.  Stochastic Optimization for Large-scale Optimal Transport , 2016, NIPS.

[36]  Kosuke Imai,et al.  Improving Ecological Inference by Predicting Individual Ethnicity from Voter Registration Records , 2016, Political Analysis.

[37]  Martin J. Wainwright,et al.  Vaidya walk: A sampling algorithm based on the volumetric barrier , 2017, 2017 55th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[38]  Nikos D. Sidiropoulos,et al.  Completing a joint PMF from projections: A low-rank coupled tensor factorization approach , 2017, 2017 Information Theory and Applications Workshop (ITA).

[39]  Frank Nielsen,et al.  Tsallis Regularized Optimal Transport and Ecological Inference , 2016, AAAI.

[40]  Santosh S. Vempala,et al.  Geodesic walks in polytopes , 2016, STOC.

[41]  Nicolas Papadakis,et al.  Regularized Optimal Transport and the Rot Mover's Distance , 2016, J. Mach. Learn. Res..