论文信息 - Learning in decentralized systems: a nonparametric approach

Learning in decentralized systems: a nonparametric approach

Rapid advances in information technology result in increased deployment of decentralized decision-making systems embedded within large-scale infrastructure consisting of data collection and processing devices. In such a system, each statistical decision is performed on the basis of limited amount of data due to constraints given by the decentralized system. For instance, the constraints maybe imposed by limits in energy source, communication bandwidth, computation or time budget. A fundamental research problem arised in decentralized systems involves the development methods that takes into account not only the statistical accuracy of decision-making procedures, but also the constraints imposed by the system limits. It is this general problem that drives the focus of this thesis. In particular, we focus on the development and analysis of statistical learning methods for decentralized decision-making by employing a nonparametric approach. The nonparametric approach imposes very little a priori assumption on the data; such flexibility allows it to be applicable to a wide range of applications. Coupled with tools from convex analysis and empirical process theory we develop computationally efficient algorithms and analyze their statistical 1 behavior both theoretically and empirically. Our specific contributions include the following. We develop a novel kernel-based algorithm for centralized detection and estimation in the ad hoc sensor networks through the challenging task of sensor mote localization. Next, we develop and analyze a nonparametric decentralized detection algorithm using the methodology of convex surrogate loss functions and marginalized kernels. The analysis of this algorithm leads to an in-depth study of the correspondence between the class of surrogate loss functions widely used in statistical machine learning and the class of divergence functionals widely used in information theory. This correspondence allows us to provide an interesting decision-theoretic justification to a given choice of divergence functionals, which often arise from asymptotic analysis. In addition, this correspondence also motivates the development and analysis of a novel M-estimation procedure for estimating divergence functionals and the likelihood ratio. Finally, we also investigate a sequential setting of the decentralized detection algorithm, and settle an open question regarding the characterization of optimal decision rules in such a setting.

Michael I. Jordan | XuanLong Nguyen | X. Nguyen

[1] Nello Cristianini,et al. Kernel Methods for Pattern Analysis , 2003, ICTAI.

[2] B. Silverman,et al. On the Estimation of a Probability Density Function by the Maximum Penalized Likelihood Method , 1982 .

[3] R. Durrett. Probability: Theory and Examples , 1993 .

[4] Kiyoshi Asai,et al. Marginalized kernels for biological sequences , 2002, ISMB.

[5] Graham Cormode,et al. Sketching Streams Through the Net: Distributed Approximate Query Tracking , 2005, VLDB.

[6] Pramod K. Varshney,et al. Distributed detection with multiple sensors I. Fundamentals , 1997, Proc. IEEE.

[7] Alʹbert Nikolaevich Shiri︠a︡ev,et al. Optimal stopping rules , 1977 .

[8] H. Chernoff. Sequential Analysis and Optimal Design , 1987 .

[9] J. Andel. Sequential Analysis , 2022, The SAGE Encyclopedia of Research Design.

[10] S. M. Ali,et al. A General Class of Coefficients of Divergence of One Distribution from Another , 1966 .

[11] David G. Stork,et al. Pattern Classification , 1973 .

[12] G. Lorden. On Excess Over the Boundary , 1970 .

[13] Yu Hen Hu,et al. Detection, classification, and tracking of targets , 2002, IEEE Signal Process. Mag..

[14] H. Joe. Estimation of entropy and other functionals of a multivariate density , 1989 .

[15] M. Birman,et al. PIECEWISE-POLYNOMIAL APPROXIMATIONS OF FUNCTIONS OF THE CLASSES $ W_{p}^{\alpha}$ , 1967 .

[16] D. W. Scott,et al. Multivariate Density Estimation, Theory, Practice and Visualization , 1992 .

[17] D. Lindley. On a Measure of the Information Provided by an Experiment , 1956 .

[18] J. Friedman. Special Invited Paper-Additive logistic regression: A statistical view of boosting , 2000 .

[19] B. Laurent. Efficient estimation of integral functionals of a density , 1996 .

[20] Joel A. Tropp,et al. Just relax: convex programming methods for identifying sparse signals in noise , 2006, IEEE Transactions on Information Theory.

[21] Chee-Yee Chong,et al. Sensor networks: evolution, opportunities, and challenges , 2003, Proc. IEEE.

[22] J. Tsitsiklis. Decentralized Detection' , 1993 .

[23] Wenjiang J. Fu,et al. Asymptotics for lasso-type estimators , 2000 .

[24] P. J. Green,et al. Density Estimation for Statistics and Data Analysis , 1987 .

[25] Somesh Jha,et al. Global Intrusion Detection in the DOMINO Overlay System , 2004, NDSS.

[26] Vladimir Vapnik,et al. Statistical learning theory , 1998 .

[27] L. Györfi,et al. Density-free convergence properties of various estimators of entropy , 1987 .

[28] Peter L. Bartlett,et al. The Sample Complexity of Pattern Classification with Neural Networks: The Size of the Weights is More Important than the Size of the Network , 1998, IEEE Trans. Inf. Theory.

[29] K. Khalil. On the Complexity of Decentralized Decision Making and Detection Problems , 2022 .

[30] A. Gualtierotti. H. L. Van Trees, Detection, Estimation, and Modulation Theory, , 1976 .

[31] Jan M. Rabaey,et al. Robust Positioning Algorithms for Distributed Ad-Hoc Wireless Sensor Networks , 2002, USENIX Annual Technical Conference, General Track.

[32] J. Lamperti. ON CONVERGENCE OF STOCHASTIC PROCESSES , 1962 .

[33] Michael I. Jordan,et al. Nonparametric decentralized detection using kernel methods , 2005, IEEE Transactions on Signal Processing.

[34] Mani B. Srivastava,et al. Dynamic fine-grained localization in Ad-Hoc networks of sensors , 2001, MobiCom '01.

[35] Bin Yu. Assouad, Fano, and Le Cam , 1997 .

[36] H. Weinert. Reproducing kernel Hilbert spaces: Applications in statistical signal processing , 1982 .

[37] Sriram Ramabhadran,et al. NetProfiler: Profiling Wide-Area Networks Using Peer Cooperation , 2005, IPTPS.

[38] N. Meinshausen,et al. High-dimensional graphs and variable selection with the Lasso , 2006, math/0608017.

[39] Venugopal V. Veeravalli,et al. Decentralized detection in sensor networks , 2003, IEEE Trans. Signal Process..

[40] Catherine Blake,et al. UCI Repository of machine learning databases , 1998 .

[41] R. Viswanathan,et al. Distributed detection of a signal in generalized Gaussian noise , 1989, IEEE Trans. Acoust. Speech Signal Process..

[42] Jianqing Fan,et al. Nonconcave penalized likelihood with a diverging number of parameters , 2004, math/0406466.

[43] P. Hall,et al. On the estimation of entropy , 1993 .

[44] George G. Lorentz,et al. Constructive Approximation , 1993, Grundlehren der mathematischen Wissenschaften.

[45] T. Lai. SEQUENTIAL ANALYSIS: SOME CLASSICAL PROBLEMS AND NEW CHALLENGES , 2001 .

[46] Deborah Estrin,et al. Robust range estimation using acoustic and multimodal sensing , 2001, Proceedings 2001 IEEE/RSJ International Conference on Intelligent Robots and Systems. Expanding the Societal Role of Robotics in the the Next Millennium (Cat. No.01CH37180).

[47] Sawasd Tantaratana,et al. Nonparametric distributed detector using Wilcoxon statistics , 1997, Signal Process..

[48] S. Geer. HIGH-DIMENSIONAL GENERALIZED LINEAR MODELS AND THE LASSO , 2008, 0804.0703.

[49] H. Vincent Poor,et al. Consistency in Models for Communication Constrained Distributed Learning , 2004, COLT.

[50] Stephen P. Boyd,et al. Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[51] Deborah Estrin,et al. GPS-less low-cost outdoor localization for very small devices , 2000, IEEE Wirel. Commun..

[52] Cameron Whitehouse. The Design of Calamari : an Ad-hoc Localization System for Sensor Networks , 2002 .

[53] Flemming Topsøe,et al. Some inequalities for information divergence and related measures of discrimination , 2000, IEEE Trans. Inf. Theory.

[54] David Haussler,et al. Exploiting Generative Models in Discriminative Classifiers , 1998, NIPS.

[55] R. N. Bradt. On the Design and Comparison of Certain Dichotomous Experiments , 1954 .

[56] Joel A. Tropp,et al. Greed is good: algorithmic results for sparse approximation , 2004, IEEE Transactions on Information Theory.

[57] A. Keziou. Dual representation of Φ-divergences and applications , 2003 .

[58] Jianqing Fan,et al. Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[59] Rick S. Blum,et al. Distributed detection with multiple sensors I. Advanced topics , 1997, Proc. IEEE.

[60] Qing Wang,et al. Divergence estimation of continuous distributions based on data-dependent partitions , 2005, IEEE Transactions on Information Theory.

[61] Heekuck Oh,et al. Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[62] L. Breiman. Arcing Classifiers , 1998 .

[63] Nils Sandell,et al. Detection with Distributed Sensors , 1980, IEEE Transactions on Aerospace and Electronic Systems.

[64] Wenxin Jiang. Process consistency for AdaBoost , 2003 .

[65] S. Geer. Empirical Processes in M-Estimation , 2000 .

[66] Walter T. Federer,et al. Sequential Design of Experiments , 1967 .

[67] D. M. Titterington,et al. Recent advances in nonlinear experiment design , 1989 .

[68] Geoffrey E. Hinton,et al. Learning internal representations by error propagation , 1986 .

[69] D. Donoho. For most large underdetermined systems of equations, the minimal 𝓁1‐norm near‐solution approximates the sparsest near‐solution , 2006 .

[70] Martin J. Wainwright,et al. Nonparametric estimation of the likelihood ratio and divergence functionals , 2007, 2007 IEEE International Symposium on Information Theory.

[71] F ROSENBLATT,et al. The perceptron: a probabilistic model for information storage and organization in the brain. , 1958, Psychological review.

[72] N. Aronszajn. Theory of Reproducing Kernels. , 1950 .

[73] 丸山徹. Convex Analysisの二,三の進展について , 1977 .

[74] M. F.,et al. Bibliography , 1985, Experimental Gerontology.

[75] Saburou Saitoh,et al. Theory of Reproducing Kernels and Its Applications , 1988 .

[76] M. A. Girshick,et al. Bayes and minimax solutions of sequential decision problems , 1949 .

[77] Bruno Sinopoli,et al. A kernel-based learning approach to ad hoc sensor network localization , 2005, TOSN.

[78] Martin J. Wainwright,et al. Sharp thresholds for high-dimensional and noisy recovery of sparsity , 2006, ArXiv.

[79] Nello Cristianini,et al. Learning the Kernel Matrix with Semidefinite Programming , 2002, J. Mach. Learn. Res..

[80] R. Tibshirani. Regression Shrinkage and Selection via the Lasso , 1996 .

[81] Shie Mannor,et al. Greedy Algorithms for Classification -- Consistency, Convergence Rates, and Adaptivity , 2003, J. Mach. Learn. Res..

[82] Martin J. Wainwright,et al. On optimal quantization rules for sequential decision problems , 2006, 2006 IEEE International Symposium on Information Theory.

[83] Stergios B. Fotopoulos,et al. All of Nonparametric Statistics , 2007, Technometrics.

[84] D. Blackwell. Comparison of Experiments , 1951 .

[85] D. Donoho,et al. Geometrizing Rates of Convergence , II , 2008 .

[86] H. V. Poor,et al. Applications of Ali-Silvey Distance Measures in the Design of Generalized Quantizers for Binary Decision Systems , 1977, IEEE Trans. Commun..

[87] Yu Hen Hu,et al. Energy Based Acoustic Source Localization , 2003, IPSN.

[88] Andy Hopper,et al. The active badge location system , 1992, TOIS.

[89] Larry A. Wasserman,et al. Rodeo: Sparse Nonparametric Regression in High Dimensions , 2005, NIPS.

[90] Larry A. Wasserman,et al. Sparse Nonparametric Density Estimation in High Dimensions Using the Rodeo , 2007, AISTATS.

[91] Michael I. Jordan,et al. Convexity, Classification, and Risk Bounds , 2006 .

[92] F. Pukelsheim. Optimal Design of Experiments , 1993 .

[93] C.C. White,et al. Dynamic programming and stochastic control , 1978, Proceedings of the IEEE.

[94] V. Koltchinskii,et al. Empirical margin distributions and bounding the generalization error of combined classifiers , 2002, math/0405343.

[95] Andrew R. Barron,et al. Universal approximation bounds for superpositions of a sigmoidal function , 1993, IEEE Trans. Inf. Theory.

[96] P. Varshney,et al. Some results on distributed nonparametric detection , 1990, 29th IEEE Conference on Decision and Control.

[97] Akbar M. Sayeed,et al. Collaborative Signal Processing for Distributed Classification in Sensor Networks , 2003, IPSN.

[98] Jon A. Wellner,et al. Weak Convergence and Empirical Processes: With Applications to Statistics , 1996 .

[99] P. Massart,et al. Estimation of Integral Functionals of a Density , 1995 .

[100] Ingo Steinwart,et al. Consistency of support vector machines and other regularized kernel classifiers , 2005, IEEE Transactions on Information Theory.

[101] D. Blackwell. Equivalent Comparisons of Experiments , 1953 .

[102] Maurizio Longo,et al. Quantization for decentralized hypothesis testing under communication constraints , 1990, IEEE Trans. Inf. Theory.

[103] Yuhong Yang,et al. Information-theoretic determination of minimax rates of convergence , 1999 .

[104] J. Wade Davis,et al. Statistical Pattern Recognition , 2003, Technometrics.

[105] John N. Tsitsiklis,et al. Extremal properties of likelihood-ratio quantizers , 1993, IEEE Trans. Commun..

[106] P. Massart. Some applications of concentration inequalities to statistics , 2000 .

[107] Emmanuel J. Candès,et al. Decoding by linear programming , 2005, IEEE Transactions on Information Theory.

[108] G. C. Hood. Estimation of Entropy , 1953 .

[109] P. Bickel. Efficient and Adaptive Estimation for Semiparametric Models , 1993 .

[110] R. F.,et al. Mathematical Statistics , 1944, Nature.

[111] Michel Broniatowski,et al. Parametric estimation and tests through divergences and the duality technique , 2008, J. Multivar. Anal..

[112] Thomas M. Cover,et al. Elements of Information Theory , 2005 .

[113] J. Tsitsiklis. On threshold rules in decentralized detection , 1986, 1986 25th IEEE Conference on Decision and Control.

[114] Venugopal V. Veeravalli,et al. Sequential decision fusion: theory and applications , 1999 .

[115] D. Luenberger. Optimization by Vector Space Methods , 1968 .

[116] H. V. Trees. Detection, Estimation, And Modulation Theory , 2001 .

[117] Ding-Xuan Zhou,et al. The covering number in learning theory , 2002, J. Complex..

[118] Emad K. Al-Hussaini,et al. Decentralized CFAR signal detection , 1995, Signal Process..

[119] Jennifer Widom,et al. Adaptive filters for continuous queries over distributed data streams , 2003, SIGMOD '03.

[120] D. S. Mitrinovic,et al. Classical and New Inequalities in Analysis , 1992 .

[121] Bin Yu,et al. Boosting with early stopping: Convergence and consistency , 2005, math/0508276.

[122] Michael K. Reiter,et al. Seurat: A Pointillist Approach to Anomaly Detection , 2004, RAID.

[123] G. Lugosi,et al. On the Bayes-risk consistency of regularized boosting methods , 2003 .

[124] Paramvir Bahl,et al. RADAR: an in-building RF-based user location and tracking system , 2000, Proceedings IEEE INFOCOM 2000. Conference on Computer Communications. Nineteenth Annual Joint Conference of the IEEE Computer and Communications Societies (Cat. No.00CH37064).

[125] J. Hiriart-Urruty,et al. Fundamentals of Convex Analysis , 2004 .

[126] K. Schittkowski,et al. NONLINEAR PROGRAMMING , 2022 .

[127] P.K. Varshney,et al. Channel-aware distributed detection in wireless sensor networks , 2006, IEEE Signal Processing Magazine.

[128] H. Vincent Poor,et al. Decentralized Sequential Detection with a Fusion Center Performing the Sequential Test , 1992, 1992 American Control Conference.

[129] H. Vincent Poor,et al. An Introduction to Signal Detection and Estimation , 1994, Springer Texts in Electrical Engineering.

[130] Yoav Freund,et al. A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[131] K. Chaloner,et al. Bayesian Experimental Design: A Review , 1995 .

[132] Y. Mei. Asymptotically optimal methods for sequential change-point detection , 2003 .

[133] Jeffrey Hightower,et al. Real-Time Error in Location Modeling for Ubiquitous Computing , 2001 .

[134] Tong Zhang. Statistical behavior and consistency of classification methods based on convex risk minimization , 2003 .

[135] S. R. Jammalamadaka,et al. Empirical Processes in M-Estimation , 2001 .

[136] Corinna Cortes,et al. Support-Vector Networks , 1995, Machine Learning.

[137] M. Degroot,et al. Comparison of Experiments and Information Measures , 1979 .

[138] W. G. Hunter,et al. Experimental Design: Review and Comment , 1984 .

[139] T. Kailath. The Divergence and Bhattacharyya Distance Measures in Signal Selection , 1967 .

[140] Hari Balakrishnan,et al. 6th ACM/IEEE International Conference on on Mobile Computing and Networking (ACM MOBICOM ’00) The Cricket Location-Support System , 2022 .

[141] Thomas Kailath,et al. RKHS approach to detection and estimation problems-I: Deterministic signals in Gaussian noise , 1971, IEEE Trans. Inf. Theory.

[142] J. Wolfowitz,et al. Optimum Character of the Sequential Probability Ratio Test , 1948 .

[143] Ling Huang,et al. Communication-Efficient Online Detection of Network-Wide Anomalies , 2007, IEEE INFOCOM 2007 - 26th IEEE International Conference on Computer Communications.

[144] Peter L. Bartlett,et al. Rademacher and Gaussian Complexities: Risk Bounds and Structural Results , 2003, J. Mach. Learn. Res..

[145] D. Donoho,et al. Geometrizing Rates of Convergence, III , 1991 .

[146] Eric R. Ziegel,et al. The Elements of Statistical Learning , 2003, Technometrics.

[147] G. Wahba. Spline models for observational data , 1990 .

[148] Andy Hopper,et al. A new location technique for the active office , 1997, IEEE Wirel. Commun..

[149] Ron Kohavi,et al. Supervised and Unsupervised Discretization of Continuous Features , 1995, ICML.

[150] Alexander J. Smola,et al. Learning with kernels , 1998 .