A Critique of Pure Hierarchy: Uncovering Cross-Cutting Structure in a Natural Dataset

How best can we understand – and visualize – the structure in multi-dimensional data? One common approach is to rely on hierarchical cluster analysis, either for theoretical or for more descriptive reasons. Here, we point out that an apparently revealing hierarchical clustering solution may well be compatible with structure that is not well characterized as a hierarchy. In particular, a hierarchical description can be equally consistent with crosscutting rather than strictly hierarchical, or nested, structure. We offer an alternative approach, based on inspection of the feature vectors provided by a singular value decomposition (SVD) which allows a flexible mixture of hierarchical and cross-cutting dimensions and which can reveal whether dimensions are cross-cutting or nested. The SVD offers a more flexible representation than a hierarchy in that it can capture either hierarchical or cross-cutting structure or blends of these two structure types, or, indeed, many other structure types. We then introduce a refinement of the SVD approach based on sparse principal component analysis that leads to more easily interpretable dimensions. In our dataset, these dimensions correspond to aquatic vs. land animals, large vs. small animals, predators vs prey animals, and primates vs. other mammals.

[1]  J. Haldane The interaction of nature and nurture. , 1946, Annals of eugenics.

[2]  James L. McClelland Parallel Distributed Processing: Implications for Cognition and Development , 1988 .

[3]  James L. McClelland,et al.  A distributed, developmental model of word recognition and naming. , 1989, Psychological review.

[4]  M. McCloskey Networks and Theories: The Place of Connectionism in Cognitive Science , 1991 .

[5]  Peter M. Todd,et al.  Learning and connectionist representations , 1993 .

[6]  David J. Field,et al.  Emergence of simple-cell receptive field properties by learning a sparse code for natural images , 1996, Nature.

[7]  T. Landauer,et al.  A Solution to Plato's Problem: The Latent Semantic Analysis Theory of Acquisition, Induction, and Representation of Knowledge. , 1997 .

[8]  James L. McClelland,et al.  Semantic Cognition: A Parallel Distributed Processing Approach , 2004 .

[9]  Daniel,et al.  Default Probability , 2004 .

[10]  R. Tibshirani,et al.  Sparse Principal Component Analysis , 2006 .

[11]  Charles Kemp,et al.  The discovery of structural form , 2008, Proceedings of the National Academy of Sciences.

[12]  S. Thompson-Schill,et al.  Developmental “Roots” in Mature Biological Knowledge , 2009, Psychological science.

[13]  James L. McClelland Running Head : Letting Structure Emerge Letting Structure Emerge : Connectionist and Dynamical Systems Approaches to Understanding Cognition , 2009 .

[14]  J. Tenenbaum,et al.  Structured statistical models of inductive reasoning. , 2009, Psychological review.

[15]  Joshua B. Tenenbaum,et al.  A probabilistic model of cross-categorization , 2011, Cognition.

[16]  James L. McClelland,et al.  Learning hierarchical category structure in deep neural networks , 2013 .

[17]  James L. McClelland Capturing Gradience, Continuous Change, and Quasi‐Regularity in Sound, Word, Phrase, and Meaning , 2015 .

[18]  James L. McClelland,et al.  You shall know an object by the company it keeps: An investigation of semantic representations derived from object co-occurrence in visual scenes , 2015, Neuropsychologia.

[19]  Rasmus Larsen,et al.  SpaSM: A MATLAB Toolbox for Sparse Statistical Modeling , 2018 .