Exploration of very large databases by self-organizing maps

This paper describes a data organization system and genuine content-addressable memory called the WEBSOM. It is a two-layer self-organizing map (SOM) architecture where documents become mapped as points on the upper map, in a geometric order that describes the similarity of their contents. By standard browsing tools one can select from the map subsets of documents that are most similar mutually. It is also possible to submit free-form queries about the wanted documents whereby the WEBSOM locates the best-matching documents. The document map exemplified in this paper has over 100000 map nodes, with 315 inputs at each, and over 1000000 documents have been organized by it. The system has been implemented by software on a general-purpose computer.