Self-Organizing Maps of Massive Databases

The Self-Organizing Map (SOM) is a computational projection method that usually maps a high-dimensional data manifold onto a regular, low-dimen-sional (say, 2D) grid. A model of some observation is associated with every node. The SOM algorithm computes the collection of the models in such a way that an arbitrary observation will be represented by the closest model with an optimal average overall accuracy. At the same time, the models will be ordered over the grid according to their similarities, which creates an abstract order and allows effective browsing of the collection. Very different kinds of data can be analyzed and visualized by the SOM: the first example discussed in detail is a similarity graph of a vast number of documents, viz. seven million patent abstracts, which will be ordered according to their contents. Unlike the other neural-network methods, however, the SOM can also organ-ize nonvectorial data. An example of this is the SOM of 77 977 protein sequences. Methods by which such huge mappings can be computed will be explained in this paper. Keywords: Self-Organizing Map (SOM), data analysis and visualization, neural networks, mapping methods