Electronic editor: automatic content-based sequential compilation of newspaper articles

Abstract New information carriers, such as electronic books and MP3 players, can be utilized for displaying customized content. Using these carriers, however, only browsing forwards and backwards is easy. The crucial question in making these carriers user-friendly is then to construct an order of presentation that enhances readability. We have developed a tool that uses the self-organizing map algorithm of Kohonen to automatically organize a collection of text articles into a meaningful content-based sequential order. The article sequence constructed by the system was compared to the sequences made by 21 humans, and in our small-scale case study they were comparable.

[1]  Thomas Hofmann,et al.  Learning the Similarity of Documents: An Information-Geometric Approach to Document Retrieval and Categorization , 1999, NIPS.

[2]  Samuel Kaski,et al.  Self organization of a massive document collection , 2000, IEEE Trans. Neural Networks Learn. Syst..

[3]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[4]  Marco Budinich,et al.  A Self-Organizing Neural Network for the Traveling Salesman Problem That Is Competitive with Simulated Annealing , 1996, Neural Computation.

[5]  Naftali Tishby,et al.  Distributional Clustering of English Words , 1993, ACL.

[6]  Naftali Tishby,et al.  Document clustering using word clusters via the information bottleneck method , 2000, SIGIR '00.

[7]  T. Kohonen Self-organized formation of topographically correct feature maps , 1982 .

[8]  Timo Honkela,et al.  WEBSOM - Self-organizing maps of document collections , 1998, Neurocomputing.

[9]  Teuvo Kohonen,et al.  Self-organized formation of topologically correct feature maps , 2004, Biological Cybernetics.

[10]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[11]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[12]  Eija Kaasinen,et al.  Integrated Multimedia Publishing: Combining TV and Newspaper Content on Personal Channels , 1999, Comput. Networks.

[13]  Bernard Angéniol,et al.  Self-organizing feature maps and the travelling salesman problem , 1988, Neural Networks.

[14]  Jorma Laaksonen,et al.  SOM_PAK: The Self-Organizing Map Program Package , 1996 .

[15]  Teuvo Kohonen,et al.  Self-Organizing Maps , 2010 .

[16]  Thomas Hofmann,et al.  Learning from Dyadic Data , 1998, NIPS.

[17]  Timo Honkela,et al.  Websom for Textual Data Mining , 1999, Artificial Intelligence Review.

[18]  Timo Honkela,et al.  Newsgroup Exploration with WEBSOM Method and Browsing Interface , 1996 .