Methods for Exploratory Cluster Analysis

The Self-Organizing Map is a nonlinear projection of a high-dimensional data space. It can be used as an ordered groundwork, a two-dimensional graphical display, for visualizing structures in the data. Different locations on the display correspond to different domains of the data space in an orderly fashion. The models used in the mapping are fitted to the data so that they approximate the data distribution nonlinearly but smoothly. In this paper we introduce new methods for visualizing the cluster structures of the data on the groundwork, and for the interpretation of the structures in terms of the local metric properties of the map. In particular it is possible to find out which variables have the largest discriminatory power between neighboring clusters. The methods are especially suitable in the exploratory phase of data analysis, or preliminary data mining, in which hypotheses on the targets of the analysis are formulated. We have used the methods for analyzing a collection of patent abstract texts. We found, for instance, a cluster of neural networks patents not distinguished by the official patent classification system.

[1]  Teuvo Kohonen,et al.  Self-organized formation of topologically correct feature maps , 2004, Biological Cybernetics.

[2]  Teuvo Kohonen,et al.  Self-Organizing Maps , 2010 .

[3]  Otto Opitz,et al.  Information and Classification , 1993 .

[4]  T. Kohonen Self-organized formation of topographically correct feature maps , 1982 .

[5]  T. Kohonen,et al.  Coloring that reveals high-dimensional structures in data , 1999, ICONIP'99. ANZIIS'99 & ANNES'99 & ACNN'99. 6th International Conference on Neural Information Processing. Proceedings (Cat. No.99EX378).

[6]  A. Ultsch,et al.  Self-Organizing Neural Networks for Visualisation and Classification , 1993 .

[7]  Timo Honkela,et al.  Newsgroup Exploration with WEBSOM Method and Browsing Interface , 1996 .

[8]  Samuel Kaski,et al.  Self organization of a massive document collection , 2000, IEEE Trans. Neural Networks Learn. Syst..

[9]  R. Tibshirani,et al.  Adaptive Principal Surfaces , 1994 .

[10]  Samuel Kaski,et al.  Methods for interpreting a self-organized map in data analysis , 1998, ESANN.