Clustering of Human Endogenous Retrovirus Sequences with Median Self-Organizing Map

Mutual relationships of human endogenous retroviruses (HERVs) and their similarities to other DNA elements are studied in this paper. We demonstrate that a completely data-driven grouping is able to reflect same kinds of relationships as more traditional biological classifications and phylogenetic taxonomies. The clusters and their visualization were computed with the Median Self-Organizing Map algorithm of pairwise FASTA-based distances. The wholesequence distances are able to distinguish between the different known types of endogenous elements, and exogenous retroviruses. The HERVs become grouped meaningfully.

[1]  Panu Somervuo,et al.  How to make large self-organizing maps for nonvectorial data , 2002, Neural Networks.

[2]  D J Rogers,et al.  A Computer Program for Classifying Plants. , 1960, Science.

[3]  Michael Tristem,et al.  Identification and Characterization of Novel Human Endogenous Retrovirus Families by Phylogenetic Screening of the Human Genome Mapping Project Database , 2000, Journal of Virology.

[4]  Jarkko Venna,et al.  Analysis and visualization of gene expression data using Self-Organizing Maps , 2002, Neural Networks.

[5]  David J Griffiths,et al.  Endogenous retroviruses in the human genome sequence , 2001, Genome Biology.

[6]  J. Jurka,et al.  Repeats in genomic DNA: mining and meaning. , 1998, Current opinion in structural biology.

[7]  Panu Somervuo,et al.  Clustering and Visualization of Large Protein Sequence Databases by Means of an Extension on the Self-Organizing Map , 2000, Discovery Science.

[8]  Teuvo Kohonen,et al.  Self-Organizing Maps , 2010 .

[9]  Mats Lindeskog Transcription, splicing and genetic structure within the human endogenous retroviral HERV-H family. , 1999 .

[10]  Panu Somervuo,et al.  Self-organizing maps of symbol strings , 1998, Neurocomputing.

[11]  D. Lipman,et al.  Improved tools for biological sequence comparison. , 1988, Proceedings of the National Academy of Sciences of the United States of America.

[12]  J. Jurka Repbase update: a database and an electronic journal of repetitive elements. , 2000, Trends in genetics : TIG.

[13]  R. Löwer,et al.  The pathogenic potential of endogenous retroviruses: facts and fantasies. , 1999, Trends in microbiology.

[14]  International Human Genome Sequencing Consortium Initial sequencing and analysis of the human genome , 2001, Nature.

[15]  J. V. Moran,et al.  Initial sequencing and analysis of the human genome. , 2001, Nature.