How to make large self-organizing maps for nonvectorial data

The self-organizing map (SOM) represents an open set of input samples by a topologically organized, finite set of models. In this paper, a new version of the SOM is used for the clustering, organization, and visualization of a large database of symbol sequences (viz. protein sequences). This method combines two principles: the batch computing version of the SOM, and computation of the generalized median of symbol strings.

[1]  T. Villmann,et al.  Topology Preservation in Self-Organizing Maps , 1999 .

[2]  Rolf Apweiler,et al.  The SWISS-PROT protein sequence data bank and its supplement TrEMBL , 1997, Nucleic Acids Res..

[3]  R. Der,et al.  A new quantitative measure of topology preservation in Kohonen's feature maps , 1994, Proceedings of 1994 IEEE International Conference on Neural Networks (ICNN'94).

[4]  E A Ferrán,et al.  Self‐organized neural maps of human protein sequences , 1994, Protein science : a publication of the Protein Society.

[5]  Teuvo Kohonen,et al.  Self-organized formation of topologically correct feature maps , 2004, Biological Cybernetics.

[6]  Teuvo Kohonen,et al.  The self-organizing map , 1990 .

[7]  T. Kohonen Self-organized formation of topographically correct feature maps , 1982 .

[8]  D. Signorini,et al.  Neural networks , 1995, The Lancet.

[9]  Amos Bairoch,et al.  The PROSITE database, its status in 1999 , 1999, Nucleic Acids Res..

[10]  Erkki Oja,et al.  Engineering applications of the self-organizing map , 1996, Proc. IEEE.

[11]  E. A. Ferr Topological maps of protein sequences , .

[12]  Jens G. Reich,et al.  Kohonen map as a visualization tool for the analysis of protein sequences: multiple alignments, domains and segments of secondary structures , 1996, Comput. Appl. Biosci..

[13]  Gilles Pagès,et al.  Theoretical aspects of the SOM algorithm , 1998, Neurocomputing.

[14]  Yizong Cheng Convergence and Ordering of Kohonen's Batch Map , 1997, Neural Computation.

[15]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[16]  Anil K. Jain,et al.  A non-linear projection method based on Kohonen's topology preserving maps , 1992, Proceedings., 11th IAPR International Conference on Pattern Recognition. Vol.II. Conference B: Pattern Recognition Methodology and Systems.

[17]  Samuel Kaski,et al.  Comparing Self-Organizing Maps , 1996, ICANN.

[18]  Amos Bairoch,et al.  The PROSITE database, its status in 1997 , 1997, Nucleic Acids Res..

[19]  Panu Somervuo,et al.  Self-organizing maps of symbol strings , 1998, Neurocomputing.

[20]  D. Lipman,et al.  Improved tools for biological sequence comparison. , 1988, Proceedings of the National Academy of Sciences of the United States of America.

[21]  Miguel A. Andrade-Navarro,et al.  Classification of protein families and detection of the determinant residues with an improved self-organizing map , 1997, Biological Cybernetics.

[22]  Panu Somervuo,et al.  Clustering and Visualization of Large Protein Sequence Databases by Means of an Extension on the Self-Organizing Map , 2000, Discovery Science.

[23]  Stephane Zrehen,et al.  Analyzing Kohonen Maps with Geometry , 1993 .

[24]  Teuvo Kohonen,et al.  Median strings , 1985, Pattern Recognit. Lett..

[25]  Teuvo Kohonen,et al.  Self-Organizing Maps , 2010 .

[26]  Temple F. Smith,et al.  Comparison of biosequences , 1981 .