Scaling laws in learning of classification tasks.

The effect of the structure of the input distribution on the complexity of learning a pattern classification task is investigated. Using statistical mechanics, we study the performance of a winner-take-all machine at learning to classify points generated by a mixture of K Gaussian distributions («clusters») in R N with intercluster distance u (relative to the cluster width). In the separation limit u>>1, the number of examples required for learning scales as NKu -p , where the exponent p is 2 for zero-temperature Gibbs learning and 4 for the Hebb rule