A Nonparametric Bayesian Model of Visual Short-Term Memory

A Nonparametric Bayesian Model of Visual Short-Term Memory A. Emin Orhan (eorhan@bcs.rochester.edu) Robert A. Jacobs (robert.jacobs@rochester.edu) Department of Brain & Cognitive Sciences, University of Rochester, Rochester NY 14627, USA Abstract We present a nonparametric Bayesian model of the organiza- tion of visual short-term memory based on the Dirichlet pro- cess mixture model. Our model implements the idea that items in visual short-term memory can be encoded at multiple lev- els of abstraction, where the appropriate levels of abstraction and how much weight should be given to each level can be automatically determined. A capacity limit is implemented in this model by favoring small numbers of clusters of items. We show that various biases and distortions reported in visual short-term recall and recognition memory literatures can be quite naturally and elegantly explained by the model. Keywords: Visual short-term memory; chunking; Dirichlet process mixture model; nonparametric Bayesian model. Introduction In a standard visual short-term memory (VSTM) experiment, subjects view a display containing multiple items with simple features (e.g. colored squares or oriented Gabor gratings) for a brief period of time and, after a delay period, are asked to report about one of those items. This procedure allows researchers to address a number of important questions: What is the exact content of their visual memory for the display at the end of the delay period and how is this content organized in VSTM? Are the items encoded independently in VSTM, or do memories for different items influence one another? If so, how? There is substantial evidence suggesting that items in VSTM are not encoded independently of one another (Brady & Alvarez, 2010; Huang & Sekuler, 2010; Jiang, Olson, & Chun, 2000). As a specific form of such dependence between the representations of different items in VSTM, it has recently been argued that VSTM is organized hierarchically with each item being represented at two different scales, a fine scale (i.e. individually) and a coarse scale through the ensemble statistics of all items in the display (Brady & Alvarez, 2010; Brady & Tenenbaum, 2010; although see our discussion of Brady and Alvarez (in press) below). In this paper, we present a generalization of this idea based on the Dirichlet process mixture model (DPMM) (Neal, 2000). DPMM is a popular nonparametric model that can describe a dataset in terms of a probability distribution over its different possible partitions. Through the use of multiple partitions, our model can repre- sent an item in VSTM not just at two levels of abstraction as proposed by Brady and Alvarez (2010) and Brady and Tenen- baum (2010), but at multiple levels of abstraction, including intermediate levels of abstraction. For instance, in one parti- tion, an item might form its own group (i.e., a fine-scale rep- resentation of the item). In another partition, this item might be grouped with one other item that is highly similar to it (a moderate-scale representation). And in a third partition, the item might be grouped with all other items (a coarse-scale representation). The advantage of DPMM is that it can auto- matically determine the appropriate partitions for the partic- ular dataset at hand and the weights that should be alloted to each partition in the posterior distribution. Details of the Model Consider a single trial of a hypothetical VSTM experiment in which an observer needs to remember the feature values (e.g. position of a square or orientation of a Gabor grating) of N items in a display. We denote the actual feature value of item i by θ i . One of the items, called the target item, is then cued and the observer is asked to report its feature value. The index of the target item is denoted by t and its feature value by θ t . We model this single trial using a non-conjugate DPMM that assumes the following generative process (Neal, 2000): θ i |µ i , τ i µ i , τ i |G G ∼ N (θ i ; µ i , τ i ) ∼ G ∼ DP(G 0 , α) G 0 (µ, τ) = U (µ; a, b) · G (τ; α τ , β τ ) Here, µ i and τ i are the mean and precision of the Gaussian component (or cluster) that item i is assigned to and they are identical for different i if the corresponding items are as- signed to the same component. N (θ; µ, τ) is a normal distri- bution with mean µ and precision τ, DP(G 0 , α) is a Dirichlet process with base distribution G 0 and concentration parame- ter α. Roughly, α acts a capacity parameter in our model: for small values of this parameter, the model favors a small num- ber of clusters or groups of similar items (“chunks”), whereas for large values, it tends to assign each item to its own cluster. We place a G (α c , 1) prior on the concentration parameter α, treating α c as a free parameter. U (µ; a, b) is a uniform dis- tribution within the range (a, b) and G (τ; α τ , β τ ) is a gamma distribution with parameters α τ and β τ . We set the range of the uniform distribution to a large interval including mini- mum and maximum possible values for the relevant variable in each experiment below. For the parameters of the gamma distribution α τ , β τ , we put a G (1, 1) prior on β τ and treat α τ as a free parameter. This reduces the total number of free pa- rameters to just 2, namely α c for the concentration parameter, and α τ for τ. We use the same parameter values α τ , α c for all trials of an experiment. Inference is performed via a Markov chain Monte Carlo (MCMC) sampling algorithm with auxil- iary variables (Algorithm 8 in Neal (2000)). Figure 1 schematically illustrates, in a hypothetical one- dimensional example, the idea of representing an item at mul- tiple levels of abstraction with a DPMM. In this figure, fea- ture values of three different items, θ i , shown in a single trial of a hypothetical VSTM experiment, are represented by

[1]  M. Chun,et al.  Organization of visual short-term memory. , 2000, Journal of experimental psychology. Learning, memory, and cognition.

[2]  Radford M. Neal Markov Chain Sampling Methods for Dirichlet Process Mixture Models , 2000 .

[3]  R. Sekuler,et al.  Recognizing spatial patterns: a noisy exemplar approach , 2002, Vision Research.

[4]  W. Ma,et al.  A detection theory account of change detection. , 2004, Journal of vision.

[5]  R. Sekuler,et al.  Lure-similarity affects visual episodic recognition : Detailed tests of a noisy exemplar model , 2005 .

[6]  Timothy F. Brady,et al.  Ensemble statistics of a display influence the representation of items in visual working memory , 2009 .

[7]  Carl E. Rasmussen,et al.  Dirichlet Process Gaussian Mixture Models: Choice of the Base Distribution , 2010, Journal of Computer Science and Technology.

[8]  R. Sekuler,et al.  Distortions in recall from visual memory: two classes of attractors at work. , 2010, Journal of vision.

[9]  R. Catrambone,et al.  Proceedings of the 32nd Annual Conference of the Cognitive Science Society , 2010 .

[10]  R. Sekuler,et al.  Homogeneity computation: How interitem similarity in visual short-term memory alters recognition , 2010, Psychonomic bulletin & review.

[11]  Timothy F. Brady,et al.  Encoding higher-order structure in visual working memory: A probabilistic model , 2010 .

[12]  Timothy F. Brady,et al.  Hierarchical Encoding in Visual Working Memory , 2010, Psychological science.