When Are k-Nearest Neighbor and Back Propagation Accurate for Feasible Sized Sets of Examples?

We first review in pedagogical fashion previous results which gave lower and upper bounds on the number of examples needed for training feedforward neural networks when valid generalization is desired. Experimental tests of generalization versus number of examples are then presented for random target networks and examples drawn from a uniform distribution. The experimental results are roughly consistent with the following heuristic: if a database of M examples is loaded onto a W weight net (for M≫W), one expects to make a fraction ɛ=W/M errors in classifying future examples drawn from the same distribution. This is consistent with our previous bounds, but if reliable strengthens them in that: (1) the bounds had large numerical constants and log factors, all of which are set equal one in the heuristic, (2) previous lower bounds on number of examples needed were valid only in a distribution independent context, whereas the experiments were conducted for a uniform distribution, and (3) the previous lower bound was valid for nets with one hidden layer only. These experiments also seem to indicate that networks with two hidden layers have Vapnik-Chervonenkis dimension roughly equal to their total number of weights.

[1]  Thomas M. Cover,et al.  Geometrical and Statistical Properties of Systems of Linear Inequalities with Applications in Pattern Recognition , 1965, IEEE Trans. Electron. Comput..

[2]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[3]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[4]  Jon Louis Bentley,et al.  An Algorithm for Finding Best Matches in Logarithmic Expected Time , 1976, TOMS.

[5]  Leslie G. Valiant,et al.  Fast probabilistic algorithms for hamiltonian circuits and matchings , 1977, STOC '77.

[6]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, STOC '84.

[7]  Demetri Psaltis,et al.  Optical Neural Computers , 1987, Topical Meeting on Optical Computing.

[8]  Eric B. Baum,et al.  On the capabilities of multilayer perceptrons , 1988, J. Complex..

[9]  Ronald L. Rivest,et al.  Training a 3-node neural network is NP-complete , 1988, COLT '88.

[10]  Leslie G. Valiant,et al.  Computational limitations on learning from examples , 1988, JACM.

[11]  Leslie G. Valiant,et al.  A general lower bound on the number of examples needed for learning , 1988, COLT '88.

[12]  J. Stephen Judd,et al.  On the complexity of loading shallow neural networks , 1988, J. Complex..

[13]  David Haussler,et al.  What Size Net Gives Valid Generalization? , 1989, Neural Computation.

[14]  Eric B. Baum,et al.  A Proposal for More Powerful Learning Algorithms , 1989, Neural Computation.

[15]  David Haussler,et al.  Learnability and the Vapnik-Chervonenkis dimension , 1989, JACM.

[16]  ERIC B. BAUM,et al.  On learning a union of half spaces , 1990, J. Complex..