Relative performance of expected and observed fisher information in covariance estimation for maximum likelihood estimates

Covariance matrix and confidence interval calculations for maximum likelihood estimates (MLEs) are commonly used in system identification and statistical inference. To accurately construct such confidence intervals, one typically needs to know the covariance of the MLE. Standard statistical theory tells that the normalized MLE is asymptotically normally distributed with mean zero and covariance being the inverse of the Fisher Information Matrix (FIM) at the unknown parameter. Two common estimates for the covariance of MLE are the inverse of the observed FIM (the same as the Hessian of negative log-likelihood) and the inverse of the expected FIM (the same as FIM). Both of the observed and expected FIM are evaluated at the MLE from the sample data. We show that, under reasonable conditions, the expected FIM outperforms the observed FIM under a mean squared error criterion. This result suggests that, with certain conditions, the expected FIM is a better estimate for the covariance of MLE in confidence interval calculations.

[1]  B. Morgan,et al.  Negative Score Test Statistic , 2007 .

[2]  Ib M. Skovgaard,et al.  A Second-Order Investigation of Asymptotic Ancillarity , 1985 .

[3]  R. E. Kass Computing observed information by finite differences , 1987 .

[4]  B. Hoadley Asymptotic Properties of Maximum Likelihood Estimators for the Independent Not Identically Distributed Case , 1971 .

[5]  H. Schellnhuber,et al.  Confidence Intervals for Flood Return Level Estimates using a Bootstrap Approach , 2006 .

[6]  James C. Spall,et al.  AN OVERVIEW OF THE SIMULTANEOUS PERTURBATION METHOD FOR EFFICIENT OPTIMIZATION , 1998 .

[7]  James C. Spall,et al.  Preliminary results on relative performance of expected and observed fisher information , 2009, Proceedings of the 48h IEEE Conference on Decision and Control (CDC) held jointly with 2009 28th Chinese Control Conference.

[8]  Ralph A. Bradley,et al.  The asymptotic properties of ML estimators when sampling from associated populations , 1962 .

[9]  A. Wald,et al.  On Stochastic Limit and Order Relationships , 1943 .

[10]  Jin Wang,et al.  Generating daily changes in market variables using a multivariate mixture of normal distributions , 2001, Proceeding of the 2001 Winter Simulation Conference (Cat. No.01CH37304).

[11]  Rory A. Fisher,et al.  Theory of Statistical Estimation , 1925, Mathematical Proceedings of the Cambridge Philosophical Society.

[12]  Gervasio Prado System identification using a maximum-likelihood spectral matching technique , 1979, ICASSP.

[13]  J. Rice Mathematical Statistics and Data Analysis , 1988 .

[14]  C. Striebel,et al.  On the maximum likelihood estimates for linear dynamic systems , 1965 .

[15]  Robert H. Shumway,et al.  On computing the expected Fisher information matrix for state-space model parameters , 1996 .

[16]  J.C. Spall,et al.  Improved methods for Monte Carlo estimation of the fisher information matrix , 2008, 2008 American Control Conference.

[17]  P. Green Iteratively reweighted least squares for maximum likelihood estimation , 1984 .

[18]  Alex Simpkins,et al.  System Identification: Theory for the User, 2nd Edition (Ljung, L.; 1999) [On the Shelf] , 2012, IEEE Robotics & Automation Magazine.

[19]  James C. Spall,et al.  Introduction to Stochastic Search and Optimization. Estimation, Simulation, and Control (Spall, J.C. , 2007 .

[20]  James C. Spall,et al.  Effect of imprecisely known nuisance parameters on estimates of primary parameters , 1989 .

[21]  G. Molenberghs,et al.  What Can Go Wrong With the Score Test? , 2007 .

[22]  Lei Nie,et al.  Strong Consistency of MLE in Nonlinear Mixed-effects Models with Large Cluster Size , 2005 .

[23]  B. Efron,et al.  Assessing the accuracy of the maximum likelihood estimator: Observed versus expected Fisher information , 1978 .

[24]  R. Royall Model robust confidence intervals using maximum likelihood estimators , 1986 .

[25]  Ming Lei,et al.  Fisher information matrix-based nonlinear system conversion for state estimation , 2010, IEEE ICCA 2010.

[26]  Geoffrey J. McLachlan,et al.  Finite Mixture Models , 2019, Annual Review of Statistics and Its Application.

[27]  Bo V. Pedersen A Comparison of the Efron-Hinkley Ancillary and the Likelihood Ratio Ancillary in a Particular Example , 1981 .

[28]  J. Lawless Statistical Models and Methods for Lifetime Data , 2002 .

[29]  James C. Spall,et al.  Introduction to stochastic search and optimization - estimation, simulation, and control , 2003, Wiley-Interscience series in discrete mathematics and optimization.

[30]  J. Spall Monte Carlo Computation of the Fisher Information Matrix in Nonstandard Settings , 2005 .

[31]  Rui J. P. de Figueiredo,et al.  Learning rules for neuro-controller via simultaneous perturbation , 1997, IEEE Trans. Neural Networks.

[32]  Roger G. Ghanem,et al.  Efficient Monte Carlo computation of Fisher information matrix using prior information , 2010, Comput. Stat. Data Anal..

[33]  W. Welch,et al.  Fisher information and maximum‐likelihood estimation of covariance parameters in Gaussian stochastic processes , 1998 .

[34]  James C. Spall,et al.  First-order data sensitivity measures with applications to a multivariate signal-plus-noise problem , 1990 .

[35]  P. Grambsch Sequential Sampling Based on the Observed Fisher Information to Guarantee The Accuracy of the Maximum Likelihood Estimator , 1983 .

[36]  J. C. Spall,et al.  Comparison of Expected and Observed Fisher Information in Variance Calculations for Parameter Estimates , 2010 .

[37]  Michael C. Fu,et al.  Two-timescale simultaneous perturbation stochastic approximation using deterministic perturbation sequences , 2003, TOMC.

[38]  F. Hampel The Influence Curve and Its Role in Robust Estimation , 1974 .

[39]  P. McCullagh Tensor Methods in Statistics , 1987 .

[40]  D. Freedman How Can the Score Test Be Inconsistent? , 2007 .

[41]  Donald F. Towsley,et al.  Multicast-based inference of network-internal loss characteristics , 1999, IEEE Trans. Inf. Theory.

[42]  James C. Spall Feedback and Weighting Mechanisms for Improving Jacobian Estimates in the Adaptive Simultaneous Perturbation Algorithm , 2009, IEEE Trans. Autom. Control..

[43]  James C. Spall,et al.  An implicit function based procedure for analyzing maximum likelihood estimates from nonidentically distributed data , 1985 .

[44]  Jiming Jiang,et al.  Partially observed information and inference about non-Gaussian mixed linear models , 2005, math/0603073.

[45]  J. Spall Multivariate stochastic approximation using a simultaneous perturbation gradient approximation , 1992 .

[46]  Fang-Kuo Sun,et al.  A maximum likelihood algorithm for the mean and covariance of nonidentically distributed observations , 1982 .

[47]  Rolf Sundberg,et al.  Conditional statistical inference and quantification of relevance , 2003 .

[48]  Else Sandved Ancillary Statistics and Estimation of the Loss in Estimation Problems , 1968 .

[49]  James A. Reeds,et al.  Jackknifing Maximum Likelihood Estimates , 1978 .

[50]  E. M. Winter,et al.  Anomaly detection from hyperspectral imagery , 2002, IEEE Signal Process. Mag..

[51]  J. Berger,et al.  Empirical Bayes Estimation of Rates in Longitudinal Studies , 1983 .

[52]  R. Fisher Two New Properties of Mathematical Likelihood , 1934 .

[53]  B. Lindsay,et al.  On second-order optimality of the observed Fisher information , 1997 .

[54]  Jan R. Magnus,et al.  Maximum Likelihood Estimation of the Multivariate Normal Mixture Model , 2009 .

[55]  P. McCullagh,et al.  Generalized Linear Models , 1972, Predictive Analytics.

[56]  William Q. Meeker,et al.  THE ASYMPTOTIC EQUIVALENCE OF THE FISHER INFORMATION MATRICES FOR TYPE I AND TYPE II CENSORED DATA FROM LOCATION-SCALE FAMILIES , 2000 .

[57]  D. Cox Some problems connected with statistical inference , 1958 .