Self-organizing maps of massive document collections

Huge document collections can be organized according to textual similarities by the self-organizing map (SOM) algorithm, when statistical representations of the textual contents are used as the feature vectors of the documents. In a practical experiment we mapped 6,840,568 patent abstracts onto a 1,002,240-node SOM. For the feature vectors we selected 500-dimensional random projections of the weighted word histograms.

[1]  Luís B. Almeida,et al.  Improving the Learning Speed in Topological Maps of Patterns , 1990 .

[2]  Kimmo Koskenniemi,et al.  A General Computational Model for Word-Form Recognition and Production , 1984, ACL.

[3]  Xia Lin,et al.  Map Displays for Information Retrieval , 1997, J. Am. Soc. Inf. Sci..

[4]  Samuel Kaski,et al.  Dimensionality reduction by random mapping: fast similarity computation for clustering , 1998, 1998 IEEE International Joint Conference on Neural Networks Proceedings. IEEE World Congress on Computational Intelligence (Cat. No.98CH36227).

[5]  Timo Honkela,et al.  Self-Organizing Maps of Very Large Document Collections: Justification for the WEBSOM Method , 1998 .

[6]  Teuvo Kohonen,et al.  Self-Organizing Maps, Second Edition , 1997, Springer Series in Information Sciences.

[7]  Fionn Murtagh,et al.  Neural networks and information extraction in astronomical information retrieval , 1996 .

[8]  Samuel Kaski,et al.  Self organization of a massive document collection , 2000, IEEE Trans. Neural Networks Learn. Syst..

[9]  Yizong Cheng Convergence and Ordering of Kohonen's Batch Map , 1997, Neural Computation.

[10]  Samuel Kaski,et al.  Keyword selection method for characterizing text document maps , 1999 .

[11]  Teuvo Kohonen,et al.  Things you haven't heard about the self-organizing map , 1993, IEEE International Conference on Neural Networks.

[12]  Dieter Merkl,et al.  Text classification with self-organizing maps: Some lessons learned , 1998, Neurocomputing.

[13]  Teuvo Kohonen,et al.  Self-Organization of Very Large Document Collections: State of the Art , 1998 .

[14]  Gary Marchionini,et al.  A self-organizing semantic map for information retrieval , 1991, SIGIR '91.

[15]  Jay F. Nunamaker,et al.  Information Visualization for Collaborative Computing , 1998, Computer.

[16]  Timo Honkela,et al.  WEBSOM - Self-organizing maps of document collections , 1998, Neurocomputing.

[17]  Hsinchun Chen,et al.  Internet Categorization and Search: A Self-Organizing Approach , 1996, J. Vis. Commun. Image Represent..

[18]  J. C. Scholtes Unsupervised learning and the information retrieval problem , 1991, [Proceedings] 1991 IEEE International Joint Conference on Neural Networks.