Exploiting Domain-Specific Properties: Compiling Parallel Dynamic Neural Network Algorithms into Efficient Code

Domain-specific constraints can be exploited to implement compiler optimizations that are not otherwise feasible. Compilers for neural network learning algorithms can achieve near-optimal colocality of data and processes and near-optimal balancing of load over processors, even for dynamically irregular problems. This is impossible for general programs, but restricting programs to the neural algorithm domain allows for the exploitation of domain-specific properties. The operations performed by neural algorithms are broadcasts, reductions, and object-local operations only; the load distribution is regular with respect to the (perhaps irregular) network topology; changes of network topology occur only from time to time. A language, compilation techniques, and a compiler implementation on the MasPar MP-1 are described and quantitative results for the effects of various optimizations used in the compiler are shown. Conservative experiments with weight pruning algorithms yield performance improvements of 27 percent due to load balancing and 195 percent improvement is achieved due to data locality, both compared to unoptimized versions. Two other optimizations-connection allocation and selecting the number of replicates-speed programs up by about 50 percent and 100 percent, respectively. This work can be viewed as a case study in exploiting domain-specific information; some of the principles presented here may apply to other domains as well.

[1]  Jill P. Mesirov,et al.  An Efficient Implementation of the Back-propagation Algorithm on the Connection Machine CM-2 , 1989, NIPS.

[2]  Michael Philippsen,et al.  Compiling machine-independent parallel programs , 1993, SIGP.

[3]  Lutz Prechelt,et al.  CuPit-2: a portable parallel programming language for artificial neural networks , 1997 .

[4]  W. Pitts,et al.  A Logical Calculus of the Ideas Immanent in Nervous Activity (1943) , 2021, Ideas That Created the Future.

[5]  Guy E. Blelloch,et al.  Implementation of a portable nested data-parallel language , 1993, PPOPP '93.

[6]  Holger Hopp,et al.  CuPit-2 - a parallel language for neural algorithms: language reference and tutorial , 1994 .

[7]  John R. Gilbert,et al.  Automatic array alignment in data-parallel programs , 1993, POPL '93.

[8]  Bruce Hendrickson,et al.  The Chaco user`s guide. Version 1.0 , 1993 .

[9]  Christian Jacob,et al.  A distributed network simulation environment for multi-processing systems , 1991, [Proceedings] 1991 IEEE International Joint Conference on Neural Networks.

[10]  Edith Schonberg,et al.  Factoring: a method for scheduling parallel loops , 1992 .

[11]  Ralf Der,et al.  Proc. ICANN'94, Int. Conf. on Artificial Neural Networks , 1994 .

[12]  Lutz Prechelt A parallel programming model for irregular dynamic neural networks , 1997, Proceedings. Third Working Conference on Massively Parallel Programming Models (Cat. No.97TB100228).

[13]  Michael Philippsen,et al.  Automatic alignment of array data and processes to reduce communication time on DMPPs , 1995, PPOPP '95.

[14]  Tom Tollenaere,et al.  Decomposition and Mapping of Locally Connected Layered Neural Networks on Message-passing Multiprocessors , 1993, Parallel Algorithms Appl..

[15]  Ferdinand Hergert,et al.  Improving model selection by nonconvergent methods , 1993, Neural Networks.

[16]  Anthony M. Sloane,et al.  Eli: a complete, flexible compiler construction system , 1992, CACM.

[17]  Gerd Kock,et al.  MiND: An Environment for the Development, Integration, and Acceleration of Connectionist Systems , 1997 .

[18]  Xiao Liu,et al.  Benchmarking of the CM-5 and the Cray machines with a very large backpropagation neural network , 1994, Proceedings of 1994 IEEE International Conference on Neural Networks (ICNN'94).

[19]  Lutz Prechelt,et al.  PROBEN 1 - a set of benchmarks and benchmarking rules for neural network training algorithms , 1994 .

[20]  Reinhard Männer,et al.  SYNAPSE-1: a high-speed general purpose parallel neurocomputer system , 1995, Proceedings of 9th International Parallel Processing Symposium.

[21]  Martin A. Riedmiller,et al.  Massively Parallel Training of Multi Layer Perceptrons With Irregular Topologies , 1995, ICANNGA.

[22]  Lutz Prechelt The CuPit compiler for the MasPar MP-1 and MP-2: a literate programming document , 1995 .

[23]  Jerome A. Feldman,et al.  Mapping connectionist networks onto parallel machines: a library approach , 1997 .

[24]  Martin A. Riedmiller,et al.  A direct adaptive method for faster backpropagation learning: the RPROP algorithm , 1993, IEEE International Conference on Neural Networks.