Influencing Elections with Statistics: Targeting Voters with Logistic Regression Trees

Political campaigning has become a multi-million dollar business. A substantial proportion of a campaign's budget is spent on voter targeting, i.e. to identify and influence as many voters as possible to vote. Based on data, campaigns use statistical tools to provide a basis for decision on whom to target. While the data available is usually rich, campaigns have relied on a rather limited selection, often including only previous voting behavior and one or two demographical variables. State-of-the-art statistical procedures that are used in voter targeting include logistic regression or simple tree methods like CHAID, but there is a growing interest in modern data mining approaches. Along the lines of the latter development, we propose a novel modern framework to approach voter targeting, "Logistic Regression Trees" (LORET). LORET are trees (which may just be a single root node) containing logistic regressions (which may just have an intercept) in every leaf. Thus, they contain logistic regression and classification trees as special cases but also allow for a synthesis of both techniques. We explore various flavors of LORET that employ (a) either a reduced or the full set of available variables and (b) structures these variables into regressors in the logistic model components and/or partitioning variables in the tree components. To assess and illustrate, we apply these LORET versions to a data set of 19,634 possible voters from the 2004 US presidential election. We find that employing more predictor variables clearly improves predictive accuracy, with the best results for methods that employ tree induction. Ineligible models are built by LORET with the reduced set of variables as regressors in each leaf. Furthermore, the synthesis of logistic regression and trees leads to models that have low overall cost for high benefit of convincing non-voters to turn out.