On the Quantization Error in SOM vs. VQ: A Critical and Systematic Study

The self-organizing map (SOM) is related to the classical vector quantization (VQ). Like in the VQ, the SOM represents a distribution of input data vectors using a finite set of models. In both methods, the quantization error (QE) of an input vector can be expressed, e.g., as the Euclidean norm of the difference of the input vector and the best-matching model. Since the models are usually optimized in the VQ so that the sum of the squared QEs is minimized for the given set of training vectors, a common notion is that it will be impossible to find models that produce a smaller rms QE. Therefore it has come as a surprise that in some cases the rms QE of a SOM can be smaller than that of a VQ with the same number of models and the same input data. This effect may manifest itself if the number of training vectors per model is on the order of small integers and the testing is made with an independent set of test vectors. An explanation seems to ensue from statistics. Each model vector in the VQ is determined as the average of those training vectors that are mapped into the same Voronoi domain as the model vector. On the contrary, each model vector of the SOM is determined as a weighted average of all of those training vectors that are mapped into the "topological" neighborhood around the corresponding model. The number of training vectors mapped into the neighborhood of a SOM model is generally much larger than that mapped into a Voronoi domain around a model in the VQ. Since the SOM model vectors are then determined with a significantly higher statistical accuracy, the Voronoi domains of the SOM are significantly more regular, and the resulting rms QE may then be smaller than in the VQ. However, the effective dimensionality of the vectors must also be sufficiently high.

[1]  Esa Alhoniemi,et al.  Self-Organizing Map for Data Mining in MATLAB: The SOM Toolbox , 1999 .

[2]  Kenneth Falconer,et al.  Fractal Geometry: Mathematical Foundations and Applications , 1990 .

[3]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[4]  J.-C. Fort,et al.  Stochastic on-line algorithm versus batch algorithm for quantization and self organizing maps , 2001, Neural Networks for Signal Processing XI: Proceedings of the 2001 IEEE Signal Processing Society Workshop (IEEE Cat. No.01TH8584).

[5]  Ron Cole,et al.  The ISOLET spoken letter database , 1990 .

[6]  Dominik R. Dersch,et al.  Asymptotic level density in topological feature maps , 1995, IEEE Trans. Neural Networks.

[7]  Allen Gersho,et al.  Asymptotically optimal block quantization , 1979, IEEE Trans. Inf. Theory.

[8]  Allen Gersho,et al.  On the structure of vector quantizers , 1982, IEEE Trans. Inf. Theory.

[9]  Gilles Pagès,et al.  Theoretical aspects of the SOM algorithm , 1998, Neurocomputing.

[10]  Helge Ritter Asymptotic level density for a class of vector quantization processes , 1991, IEEE Trans. Neural Networks.

[11]  Robert M. Gray,et al.  An Algorithm for Vector Quantizer Design , 1980, IEEE Trans. Commun..

[12]  Fernando Bação,et al.  Self-organizing Maps as Substitutes for K-Means Clustering , 2005, International Conference on Computational Science.

[13]  Jack Dongarra,et al.  Computational Science - ICCS 2005, 5th International Conference, Atlanta, GA, USA, May 22-25, 2005, Proceedings, Part I , 2005, International Conference on Computational Science.

[14]  Paul L. Zador,et al.  Asymptotic quantization error of continuous signals and the quantization dimension , 1982, IEEE Trans. Inf. Theory.

[15]  Yiming Yang,et al.  RCV1: A New Benchmark Collection for Text Categorization Research , 2004, J. Mach. Learn. Res..

[16]  Les E. Atlas,et al.  A comparison of the LBG algorithm and Kohonen neural network paradigm for image vector quantization , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[17]  Keinosuke Fukunaga,et al.  An Algorithm for Finding Intrinsic Dimensionality of Data , 1971, IEEE Transactions on Computers.

[18]  Teuvo Kohonen,et al.  Self-Organizing Maps , 2010 .