Nuclear penalized multinomial regression with an application to predicting at bat outcomes in baseball

We propose the nuclear norm penalty as an alternative to the ridge penalty for regularized multinomial regression. This convex relaxation of reduced-rank multinomial regression has the advantage of leveraging underlying structure among the response categories to make better predictions. We apply our method, nuclear penalized multinomial regression (NPMR), to Major League Baseball play-by-play data to predict outcome probabilities based on batter–pitcher matchups. The interpretation of the results meshes well with subject-area expertise and also suggests a novel understanding of what differentiates players.

[1]  Null Brad Modeling Baseball Player Ability with a Nested Dirichlet Distribution , 2009 .

[2]  Benjamin S. Baumer,et al.  The Sabermetric Revolution , 2013 .

[3]  Trevor Hastie,et al.  Statistical Learning with Sparsity: The Lasso and Generalizations , 2015 .

[4]  Trevor J Hastie,et al.  Reduced-rank vector generalized linear models , 2003 .

[5]  Gerhard Tutz,et al.  Regularized regression for categorical data , 2016 .

[6]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[7]  B. Efron,et al.  Data Analysis Using Stein's Estimator and its Generalizations , 1975 .

[8]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[9]  Y. Nesterov Gradient methods for minimizing composite objective function , 2007 .

[10]  J. Anderson Regression and Ordered Categorical Variables , 1984 .

[11]  ModelsThomas W. Yee Reduced-rank Vector Generalized Linear Models , 2000 .

[12]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[13]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[14]  R. Likert “Technique for the Measurement of Attitudes, A” , 2022, The SAGE Encyclopedia of Research Design.

[15]  Jim Albert Improved component predictions of batting and pitching measures , 2016 .

[16]  L. Brown In-season prediction of batting averages: A field test of empirical Bayes and Bayes methodologies , 2008, 0803.3697.

[17]  S Greenland,et al.  Alternative models for ordinal logistic regression. , 1994, Statistics in medicine.

[18]  Kung-Sik Chan,et al.  Reduced rank regression via adaptive nuclear norm penalization. , 2012, Biometrika.

[19]  T. Yee The VGAM Package for Categorical Data Analysis , 2010 .

[20]  C. Morris Parametric Empirical Bayes Inference: Theory and Applications , 1983 .