On information divergence measures, surrogate loss functions and decentralized hypothesis testing

We establish a general correspondence between two classes of statistical functions: AliSilvey distances (also known as f -divergences) and surrogate loss functions. Ali-Silvey distances play an important role in signal processing and information theory, for instance as error exponents in hypothesis testing problems. Surrogate loss functions (e.g., hinge loss, exponential loss) are the basis of recent advances in statistical learning methods for classi£cation (e.g., the support vector machine, AdaBoost). We provide a connection between these two lines of research, showing how to determine the unique f -divergence induced by a given surrogate loss, and characterizing all surrogate loss functions that realize a given f -divergence. The correspondence between f -divergences and surrogate loss functions has applications to the problem of designing quantization rules for decentralized hypothesis testing in the framework of statistical learning (i.e., when the underlying distributions are unknown, but the learner has access to labeled samples).

[1]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[2]  John N. Tsitsiklis,et al.  Extremal properties of likelihood-ratio quantizers , 1993, IEEE Trans. Commun..

[3]  T. Kailath The Divergence and Bhattacharyya Distance Measures in Signal Selection , 1967 .

[4]  Tong Zhang Statistical behavior and consistency of classification methods based on convex risk minimization , 2003 .

[5]  Venugopal V. Veeravalli,et al.  Decentralized detection in sensor networks , 2003, IEEE Trans. Signal Process..

[6]  Flemming Topsøe,et al.  Some inequalities for information divergence and related measures of discrimination , 2000, IEEE Trans. Inf. Theory.

[7]  J. Tsitsiklis Decentralized Detection' , 1993 .

[8]  Martin J. Wainwright,et al.  On divergences, surrogate loss functions, and decentralized detection , 2005, ArXiv.

[9]  H. V. Poor,et al.  Applications of Ali-Silvey Distance Measures in the Design of Generalized Quantizers for Binary Decision Systems , 1977, IEEE Trans. Commun..

[10]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[11]  Ingo Steinwart,et al.  Consistency of support vector machines and other regularized kernel classifiers , 2005, IEEE Transactions on Information Theory.

[12]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[13]  Michael I. Jordan,et al.  Convexity, Classification, and Risk Bounds , 2006 .

[14]  H. Chernoff A Measure of Asymptotic Efficiency for Tests of a Hypothesis Based on the sum of Observations , 1952 .

[15]  Philip D. Plowright,et al.  Convexity , 2019, Optimization for Chemical and Biochemical Engineering.

[16]  H. Vincent Poor,et al.  Decentralized Sequential Detection with a Fusion Center Performing the Sequential Test , 1992 .

[17]  丸山 徹 Convex Analysisの二,三の進展について , 1977 .

[18]  S. M. Ali,et al.  A General Class of Coefficients of Divergence of One Distribution from Another , 1966 .

[19]  Michael I. Jordan,et al.  Nonparametric decentralized detection using kernel methods , 2005, IEEE Transactions on Signal Processing.