Learning sparse, overcomplete representations of time-varying natural images

I show how to adapt an overcomplete dictionary of space-time functions so as to represent time-varying natural images with maximum sparsity. The basis functions are considered as part of a probabilistic model of image sequences, with a sparse prior imposed over the coefficients. Learning is accomplished by maximizing the log-likelihood of the model, using natural movies as training data. The basis functions that emerge are space-time inseparable functions that resemble the motion-selective receptive fields of simple-cells in mammalian visual cortex. When the coefficients are computed via matching-pursuit in space and time, one obtains a punctuate, spike-like representation of continuous time-varying images. It is suggested that such a coding scheme may be at work in the visual cortex.