On divergences, surrogate loss functions, and decentralized detection

We develop a general correspondence between a family of loss functions that act as surrogates to 0-1 loss, and the class of Ali-Silvey or f -divergence functionals. This correspondence provides the basis for choosing and evaluating various surrogate losses frequently used in statistical learning (e.g., hinge loss, exponential loss, logistic loss); conversely, it provides a decision-theoretic framework for the choice of divergences in signal processing and quantization theory. We exploit this correspondence to characterize the statistical behavior of a nonparametric decentralized hypothesis testing algorithms that operate by minimizing convex surrogate loss functions. In particular, we specify the family of loss functions that are equivalent to 0-1 loss in the sense of producing the same quantization rules and discriminant functions.

[1]  N. Aronszajn Theory of Reproducing Kernels. , 1950 .

[2]  D. Blackwell Comparison of Experiments , 1951 .

[3]  D. Blackwell Equivalent Comparisons of Experiments , 1953 .

[4]  R. N. Bradt On the Design and Comparison of Certain Dichotomous Experiments , 1954 .

[5]  S. M. Ali,et al.  A General Class of Coefficients of Divergence of One Distribution from Another , 1966 .

[6]  T. Kailath The Divergence and Bhattacharyya Distance Measures in Signal Selection , 1967 .

[7]  Thomas L. Marzetta,et al.  Detection, Estimation, and Modulation Theory , 1976 .

[8]  H. V. Poor,et al.  Applications of Ali-Silvey Distance Measures in the Design of Generalized Quantizers for Binary Decision Systems , 1977, IEEE Trans. Commun..

[9]  丸山 徹 Convex Analysisの二,三の進展について , 1977 .

[10]  Saburou Saitoh,et al.  Theory of Reproducing Kernels and Its Applications , 1988 .

[11]  Maurizio Longo,et al.  Quantization for decentralized hypothesis testing under communication constraints , 1990, IEEE Trans. Inf. Theory.

[12]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[13]  J. Tsitsiklis Decentralized Detection' , 1993 .

[14]  John N. Tsitsiklis,et al.  Extremal properties of likelihood-ratio quantizers , 1993, IEEE Trans. Commun..

[15]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[16]  Jon A. Wellner,et al.  Weak Convergence and Empirical Processes: With Applications to Statistics , 1996 .

[17]  Rick S. Blum,et al.  Distributed detection with multiple sensors I. Advanced topics , 1997, Proc. IEEE.

[18]  L. Breiman Arcing Classifiers , 1998 .

[19]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[20]  Flemming Topsøe,et al.  Some inequalities for information divergence and related measures of discrimination , 2000, IEEE Trans. Inf. Theory.

[21]  J. Friedman Special Invited Paper-Additive logistic regression: A statistical view of boosting , 2000 .

[22]  Peter L. Bartlett,et al.  Rademacher and Gaussian Complexities: Risk Bounds and Structural Results , 2003, J. Mach. Learn. Res..

[23]  H. V. Trees Detection, Estimation, And Modulation Theory , 2001 .

[24]  Venugopal V. Veeravalli,et al.  Decentralized detection in sensor networks , 2003, IEEE Trans. Signal Process..

[25]  Wenxin Jiang Process consistency for AdaBoost , 2003 .

[26]  Tong Zhang Statistical behavior and consistency of classification methods based on convex risk minimization , 2003 .

[27]  Shie Mannor,et al.  Greedy Algorithms for Classification -- Consistency, Convergence Rates, and Adaptivity , 2003, J. Mach. Learn. Res..

[28]  G. Lugosi,et al.  On the Bayes-risk consistency of regularized boosting methods , 2003 .

[29]  Chee-Yee Chong,et al.  Sensor networks: evolution, opportunities, and challenges , 2003, Proc. IEEE.

[30]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[31]  Michael I. Jordan,et al.  Nonparametric decentralized detection using kernel methods , 2005, IEEE Transactions on Signal Processing.

[32]  Ingo Steinwart,et al.  Consistency of support vector machines and other regularized kernel classifiers , 2005, IEEE Transactions on Information Theory.

[33]  Michael I. Jordan,et al.  Convexity, Classification, and Risk Bounds , 2006 .