论文信息 - A Generalized Sorting Strategy for Computer Classifications

A Generalized Sorting Strategy for Computer Classifications

AGGLOMERATIVE hierarchical methods of computer classification all begin by calculating distance-measures between elements. The hierarchy is then generated by subjecting these measures to a sorting-strategy, which depends essentially on the definition of a distance-measure between groups of elements. In nearest-neighbour sorting, this is defined as the distance between the closest pair of elements, one in each group. Macnaughton-Smith has pointed out that much more intense clustering can be produced by taking the most remote pair of elements (furthest-neighbour sorting). In group-average sorting1 the distance is defined as the mean of all between-group inter-element distances; in centroid sorting it is the distance between group centroids, defined by a conventional Euclidean model. In median2 sorting the distance of a third group from two which have just fused depends on the previous three inter-group distances in the manner of Apollonius's theorem. Although the earlier of these strategies have received some comparative assessment1,3–5 no attempt seems to have been made to generalize them into a single system. As a result, quite different computer strategies have commonly been used, necessitating a separate computer program for each.

W. T. Williams | W. T. WILLIAMS | G. N. LANCE | G. Lance

[1] R. Sokal,et al. Principles of numerical taxonomy , 1965 .

[2] W. T. Williams,et al. Fundamental Problems in Numerical Taxonomy , 1966 .

[3] G. N. Lance,et al. Computer Programs for Hierarchical Polythetic Classification ("Similarity Analyses") , 1966, Comput. J..

[4] W. T. Williams,et al. Multivariate Methods in Plant Ecology: V. Similarity Analyses and Information-Analysis , 1966 .

[5] Robert R. Sokal,et al. A statistical method for evaluating systematic relationships , 1958 .