Deep belief networks

The important aspect of this layer-wise training procedure is that, provided the number of features per layer does not decrease, [6] showed that each extra layer increases a variational lower bound on the log probability of data. So layer-by-layer training can be repeated several times1 to learn a deep, hierarchical model in which each layer of features captures strong high-order correlations between the activities of features in the layer below. We will discuss three ideas based on greedily learning a hierarchy of features: