Data locality and load balancing for parallel neural network learning

Compilers for neural network learning algorithms can achieve near-optimal co-locality of data and processes and near-optimal balancing of load over processors for irregular problems. This is impossible for general programs, but restricting programs to that particular problem domain allows for the exploitation of domain-speci c properties: The operations performed by neural algorithms are broadcasts, reductions, and object-local operations only; the load distribution is regular with respect to the (perhaps irregular) network topology; changes of network topology occur only from time to time. Compilation techniques and a compiler implementation for the MasPar MP-1 is described and quantitative results for the e ects of various optimizations used in the compiler are given. Experiments with weight pruning algorithms yielded speedups of 28% due to load balancing, and of 195% due to data locality. Two other optimizations, connection allocation and selecting the number of replicates, speed programs up by about 50% or 100%, respectively.

[1]  Ferdinand Hergert,et al.  Improving model selection by nonconvergent methods , 1993, Neural Networks.

[2]  Anthony M. Sloane,et al.  Eli: a complete, flexible compiler construction system , 1992, CACM.

[3]  Michael Philippsen,et al.  Data and Process Alignment in Modula-2 , 1994, Automatic Parallelization.

[4]  Lutz Prechelt The CuPit compiler for the MasPar MP-1 and MP-2: a literate programming document , 1995 .

[5]  Christian Jacob,et al.  A distributed network simulation environment for multi-processing systems , 1991, [Proceedings] 1991 IEEE International Joint Conference on Neural Networks.

[6]  Holger Hopp,et al.  CuPit-2 - a parallel language for neural algorithms: language reference and tutorial , 1994 .

[7]  Edith Schonberg,et al.  Factoring: a method for scheduling parallel loops , 1992 .

[8]  Bruce Hendrickson,et al.  The Chaco user`s guide. Version 1.0 , 1993 .

[9]  W. Pitts,et al.  A Logical Calculus of the Ideas Immanent in Nervous Activity (1943) , 2021, Ideas That Created the Future.

[10]  Guy E. Blelloch,et al.  Implementation of a portable nested data-parallel language , 1993, PPOPP '93.

[11]  Xiao Liu,et al.  Benchmarking of the CM-5 and the Cray machines with a very large backpropagation neural network , 1994, Proceedings of 1994 IEEE International Conference on Neural Networks (ICNN'94).

[12]  John R. Gilbert,et al.  Automatic array alignment in data-parallel programs , 1993, POPL '93.

[13]  James A. Anderson,et al.  Neurocomputing: Foundations of Research , 1988 .

[14]  Lutz Prechelt,et al.  PROBEN 1 - a set of benchmarks and benchmarking rules for neural network training algorithms , 1994 .

[15]  Tom Tollenaere,et al.  Decomposition and Mapping of Locally Connected Layered Neural Networks on Message-passing Multiprocessors , 1993, Parallel Algorithms Appl..

[16]  Michael Philippsen,et al.  Compiling machine-independent parallel programs , 1993, SIGP.

[17]  Martin A. Riedmiller,et al.  A direct adaptive method for faster backpropagation learning: the RPROP algorithm , 1993, IEEE International Conference on Neural Networks.