Getting Things in Order: An Introduction to the R Package seriation

Seriation, i.e., finding a suitable linear order for a set of objects given data and a loss or merit function, is a basic problem in data analysis. Caused by the problem's combinatorial nature, it is hard to solve for all but very small sets. Nevertheless, both exact solution methods and heuristics are available. In this paper we present the package seriation which provides an infrastructure for seriation with R. The infrastructure comprises data structures to represent linear orders as permutation vectors, a wide array of seriation methods using a consistent interface, a method to calculate the value of various loss and merit functions, and several visualization techniques which build on seriation. To illustrate how easily the package can be applied for a variety of applications, a comprehensive collection of examples is presented.

[1]  Abraham P. Punnen,et al.  The traveling salesman problem and its variations , 2007 .

[2]  F. Marcotorchino,et al.  Block seriation problems: A unified approach. Reply to the problem of H. Garcia and J. M. Proth (Applied Stochastic Models and Data Analysis, 1, (1), 25–34 (1985)) , 1987 .

[3]  Phipps Arabie,et al.  Combinatorial Data Analysis: Optimization by Dynamic Programming , 1987 .

[4]  Stefan Niermann Optimizing the Ordering of Tables With Evolutionary Computation , 2005 .

[5]  Anne-Béatrice Dufour,et al.  The ade4 Package: Implementing the Duality Diagram for Ecologists , 2007 .

[6]  M. Brusco,et al.  Heuristic Implementation of Dynamic Programming for Matrix Permutation Problems in Combinatorial Data Analysis , 2008 .

[7]  D. Kendall,et al.  Mathematics in the Archaeological and Historical Sciences , 1971, The Mathematical Gazette.

[8]  Giovanni Rinaldi,et al.  Facet identification for the symmetric traveling salesman polytope , 1990, Math. Program..

[9]  Peter Ihm,et al.  A Contribution to the History of Seriation in Archaeology , 2004, GfKl.

[10]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[11]  Brian W. Kernighan,et al.  An Effective Heuristic Algorithm for the Traveling-Salesman Problem , 1973, Oper. Res..

[12]  Chun-Houh Chen GENERALIZED ASSOCIATION PLOTS: INFORMATION VISUALIZATION VIA ITERATIVELY GENERATED CORRELATION MATRICES , 2002 .

[13]  H. Wainer,et al.  TWO ADDITIONS TO HIERARCHICAL CLUSTER ANALYSIS , 1972 .

[14]  Lawrence Hubert,et al.  SOME APPLICATIONS OF GRAPH THEORY AND RELATED NON‐METRIC TECHNIQUES TO PROBLEMS OF APPROXIMATE SERIATION: THE CASE OF SYMMETRIC PROXIMITY MEASURES , 1974 .

[15]  Cesare Furlanello,et al.  Algebraic stability indicators for ranked lists in molecular profiling , 2008, Bioinform..

[16]  Gilles Caraux,et al.  PermutMatrix: a graphical environment to arrange gene expression profiles in optimal linear order , 2005, Bioinform..

[17]  Jacques Bertin,et al.  Graphics and graphic information-processing , 1981 .

[18]  William C. Halperin,et al.  Unclassed matrix shading and optimal ordering in hierarchical cluster analysis , 1984 .

[19]  P. Sneath,et al.  Numerical Taxonomy , 1962, Nature.

[20]  M. Held,et al.  A dynamic programming approach to sequencing problems , 1962, ACM National Meeting.

[21]  Daniel J. Rosenkrantz,et al.  An Analysis of Several Heuristics for the Traveling Salesman Problem , 1977, SIAM J. Comput..

[22]  Gregory Gutin,et al.  The traveling salesman problem , 2006, Discret. Optim..

[23]  Phipps Arabie,et al.  The bond energy algorithm revisited , 1990, IEEE Trans. Syst. Man Cybern..

[24]  W. M. Flinders Petrie,et al.  Sequences in Prehistoric Remains , 1899 .

[25]  Michael Hahsler,et al.  TSPInfrastructure for the Traveling Salesperson Problem , 2007 .

[26]  Panos M. Pardalos,et al.  Quadratic Assignment Problem , 1997, Encyclopedia of Optimization.

[27]  Robert F. Ling,et al.  A computer generated aid for cluster analysis , 1973, CACM.

[28]  Tommi S. Jaakkola,et al.  Fast optimal leaf ordering for hierarchical clustering , 2001, ISMB.

[29]  W. S. Robinson A Method for Chronologically Ordering Archaeological Deposits , 1951, American Antiquity.

[30]  Michael J. Brusco,et al.  Combinatorial Data Analysis: Optimization by Dynamic Programming, by L. Hubert, P. Arabie, and J. Meulman , 2001, Journal of Classification.

[31]  Chris H. Q. Ding,et al.  Linearized cluster assignment via spectral ordering , 2004, ICML.

[32]  H. D. Simon,et al.  A spectral algorithm for envelope reduction of sparse matrices , 1993, Supercomputing '93. Proceedings.

[33]  Phipps Arabie,et al.  AN OVERVIEW OF COMBINATORIAL DATA ANALYSIS , 1996 .

[34]  P. Legendre,et al.  vegan : Community Ecology Package. R package version 1.8-5 , 2007 .

[35]  Editors-in-chief,et al.  Encyclopedia of statistics in behavioral science , 2005 .

[36]  Joydeep Ghosh,et al.  Relationship-Based Clustering and Visualization for High-Dimensional Data Mining , 2003, INFORMS J. Comput..

[37]  Jan Karel Lenstra,et al.  Technical Note - Clustering a Data Array and the Traveling-Salesman Problem , 1974, Oper. Res..

[38]  A. M. Hilliard AFFILIATION , 1910 .

[39]  David C. Howell,et al.  Unidimensional Scaling , 2004 .

[40]  Hans-Friedrich Köhn,et al.  Branch-and-bound applications in combinatorial data analysis , 2006, Psychometrika.

[41]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[42]  Hans-Hermann Bock,et al.  Classification, Data Analysis, and Knowledge Organization , 1991 .

[43]  Mike Reape,et al.  Getting things in order , 1996 .

[44]  Paul J. Schweitzer,et al.  Problem Decomposition and Data Reorganization by a Clustering Technique , 1972, Oper. Res..