Parzen Windows: Simplest Regularization Algorithm

Nonparametric learning methods such as Parzen Windows have been applied to a variety of density estimation and classification problems. In this chapter we derive a “simplest” regularization algorithm and establish its close relationship with Parzen Windows. We derive the finite sample error bound for the “simplest” regularization algorithm. Because of the close relationship between the “simplest” algorithm and Parzen Windows, this analysis provides interesting insight to Parzen Windows from the view point of learning theory. Our work is a realization of the design principle of dynamic data driven applications system (DDDAS) introduced in Chapter 1. Finally, we provide empirical results on the performance of the “simplest” regularization algorithm (Parzen Windows) and other methods such as nearest neighbor classifiers, and the regularization algorithm on a number of real data sets. These results corroborate well our theoretical analysis.

[1]  V. Strassen Gaussian elimination is not optimal , 1969 .

[2]  Dimitrios Gunopulos,et al.  Locally Adaptive Metric Nearest-Neighbor Classification , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[3]  Octavia I. Camps,et al.  Weighted Parzen Windows for Pattern Classification , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[4]  Lei Shi,et al.  Learning rates for regularized least squares ranking algorithm , 2017 .

[5]  Minyoung Kim Large margin cost-sensitive learning of conditional random fields , 2010, Pattern Recognit..

[6]  Slobodan Vucetic,et al.  An Active Learning Algorithm Based on Parzen Window Classication , 2011 .

[7]  Chong-Ho Choi,et al.  Input Feature Selection by Mutual Information Based on Parzen Window , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[8]  Gunnar Rätsch,et al.  Efficient Margin Maximizing with Boosting , 2005, J. Mach. Learn. Res..

[9]  A. E. Hoerl,et al.  Ridge regression: biased estimation for nonorthogonal problems , 2000 .

[10]  Alex J. Aved,et al.  Multi-INT Query Language for DDDAS Designs , 2015, ICCS.

[11]  John Shawe-Taylor,et al.  PAC Bayes and Margins , 2003 .

[12]  Lorenzo Rosasco,et al.  Dimensionality reduction and generalization , 2007, ICML '07.

[13]  Robert Tibshirani,et al.  Discriminant Adaptive Nearest Neighbor Classification , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[14]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[15]  Dimitrios Gunopulos,et al.  Large margin nearest neighbor classifiers , 2005, IEEE Transactions on Neural Networks.

[16]  R. Palmer,et al.  Introduction to the theory of neural computation , 1994, The advanced book program.

[17]  Robert Tibshirani,et al.  Margin Trees for High-dimensional Classification , 2007, J. Mach. Learn. Res..

[18]  Bernhard Schölkopf,et al.  Learning with kernels , 2001 .

[19]  E. Parzen On Estimation of a Probability Density Function and Mode , 1962 .

[20]  S. Smale,et al.  Shannon sampling II: Connections to learning theory , 2005 .

[21]  Erik Blasch,et al.  Dynamic Data-driven Application System (DDDAS) for Video Surveillance User Support , 2015, ICCS.

[22]  Ding-Xuan Zhou,et al.  Distributed Learning with Regularized Least Squares , 2016, J. Mach. Learn. Res..

[23]  Yann Guermeur,et al.  VC Theory of Large Margin Multi-Category Classifiers , 2007, J. Mach. Learn. Res..

[24]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[25]  Ji Zhu,et al.  Boosting as a Regularized Path to a Maximum Margin Classifier , 2004, J. Mach. Learn. Res..

[26]  Jeff G. Schneider,et al.  Projection Penalties: Dimension Reduction without Loss , 2010, ICML.

[27]  Kwong-Sak Leung,et al.  Generalized Regularized Least-Squares Learning with Predefined Features in a Hilbert Space , 2006, NIPS.

[28]  T. Poggio,et al.  The Mathematics of Learning: Dealing with Data , 2005, 2005 International Conference on Neural Networks and Brain.

[29]  Erik Blasch,et al.  Dynamic Data Driven Applications Systems (DDDAS) modeling for automatic target recognition , 2013, Defense, Security, and Sensing.

[30]  Zaïd Harchaoui,et al.  Catalyst Acceleration for First-order Convex Optimization: from Theory to Practice , 2017, J. Mach. Learn. Res..

[31]  Maya R. Gupta,et al.  Similarity-based Classification: Concepts and Algorithms , 2009, J. Mach. Learn. Res..

[32]  Felipe Cucker,et al.  On the mathematical foundations of learning , 2001 .

[33]  Tomaso A. Poggio,et al.  Regularization Networks and Support Vector Machines , 2000, Adv. Comput. Math..

[34]  Andreas Maurer,et al.  Learning Similarity with Operator-valued Large-margin Classifiers , 2008, J. Mach. Learn. Res..

[35]  A Gordon,et al.  Classification, 2nd Edition , 1999 .

[36]  Junhui Wang,et al.  Large Margin Semi-supervised Learning , 2007, J. Mach. Learn. Res..

[37]  Lorenzo Rosasco,et al.  Model Selection for Regularized Least-Squares Algorithm in Learning Theory , 2005, Found. Comput. Math..

[38]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[39]  Erik Blasch,et al.  Dynamic Data Driven Applications System Concept for Information Fusion , 2013, ICCS.

[40]  Jing Peng,et al.  Adaptive quasiconformal kernel nearest neighbor classification , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[41]  Keinosuke Fukunaga,et al.  Introduction to Statistical Pattern Recognition , 1972 .

[42]  Yixin Chen,et al.  Multiclass classification with potential function rules: Margin distribution and generalization , 2012, Pattern Recognit..

[43]  Felipe Cucker,et al.  Best Choices for Regularization Parameters in Learning Theory: On the Bias—Variance Problem , 2002, Found. Comput. Math..

[44]  Bernhard Schölkopf,et al.  The connection between regularization operators and support vector kernels , 1998, Neural Networks.