Transformation groups, such as translations or rotations, effectively express part of the variability observed in many recognition problems. The group structure enables the construction of invariant signal representations with appealing mathematical properties, where convolutions, together with pooling operators, bring stability to additive and geometric perturbations of the input. Whereas physical transformation groups are ubiquitous in image and audio applications, they do not account for all the variability of complex signal classes.
We show that the invariance properties built by deep convolutional networks can be cast as a form of stable group invariance. The network wiring architecture determines the invariance group, while the trainable filter coefficients characterize the group action. We give explanatory examples which illustrate how the network architecture controls the resulting invariance group. We also explore the principle by which additional convolutional layers induce a group factorization enabling more abstract, powerful invariant representations.
[1]
M. Stone.
On One-Parameter Unitary Groups in Hilbert Space
,
1932
.
[2]
Yoshua Bengio,et al.
Gradient-based learning applied to document recognition
,
1998,
Proc. IEEE.
[3]
Bruno A. Olshausen,et al.
Learning Transformational Invariants from Natural Movies
,
2008,
NIPS.
[4]
Yann LeCun,et al.
Structured sparse coding via lateral inhibition
,
2011,
NIPS.
[5]
Stéphane Mallat,et al.
Group Invariant Scattering
,
2011,
ArXiv.
[6]
Stéphane Mallat,et al.
Combined scattering for rotation invariant texture analysis
,
2012,
ESANN.
[7]
Stéphane Mallat,et al.
Invariant Scattering Convolution Networks
,
2012,
IEEE transactions on pattern analysis and machine intelligence.