A Fully Pipelined FPGA Architecture of a Factored Restricted Boltzmann Machine Artificial Neural Network

Artificial neural networks (ANNs) are a natural target for hardware acceleration by FPGAs and GPGPUs because commercial-scale applications can require days to weeks to train using CPUs, and the algorithms are highly parallelizable. Previous work on FPGAs has shown how hardware parallelism can be used to accelerate a “Restricted Boltzmann Machine” (RBM) ANN algorithm, and how to distribute computation across multiple FPGAs. Here we describe a fully pipelined parallel architecture that exploits “mini-batch” training (combining many input cases to compute each set of weight updates) to further accelerate ANN training. We implement on an FPGA, for the first time to our knowledge, a more powerful variant of the basic RBM, the “Factored RBM” (fRBM). The fRBM has proved valuable in learning transformations and in discovering features that are present across multiple types of input. We obtain (in simulation) a 100-fold acceleration (vs. CPU software) for an fRBM having N = 256 units in each of its four groups (two input, one output, one intermediate group of units) running on a Virtex-6 LX760 FPGA. Many of the architectural features we implement are applicable not only to fRBMs, but to basic RBMs and other ANN algorithms more broadly.

[1]  Geoffrey E. Hinton,et al.  Learning to Represent Spatial Transformations with Factored Higher-Order Boltzmann Machines , 2010, Neural Computation.

[2]  Clark S. Lindsey,et al.  Review of hardware neural networks: A User's perspective , 1994 .

[3]  Jihan Zhu,et al.  FPGA Implementations of Neural Networks - A Survey of a Decade of Progress , 2003, FPL.

[4]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[5]  J. L. Holt,et al.  Back propagation simulations using limited precision calculations , 1991, IJCNN-91-Seattle International Joint Conference on Neural Networks.

[6]  Keechul Jung,et al.  GPU implementation of neural networks , 2004, Pattern Recognit..

[7]  Kunle Olukotun,et al.  A Large-Scale Architecture for Restricted Boltzmann Machines , 2010, 2010 18th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines.

[8]  Geoffrey E. Hinton,et al.  Unsupervised Learning of Image Transformations , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Geoffrey E. Hinton,et al.  Factored conditional restricted Boltzmann Machines for modeling motion style , 2009, ICML '09.

[10]  Yutaka Maeda,et al.  FPGA implementation of a pulse density neural network with learning ability using simultaneous perturbation , 2003, IEEE Trans. Neural Networks.

[11]  Noel E. O'Connor,et al.  An Efficient Hardware Architecture for a Neural Network Activation Function Generator , 2006, ISNN.

[12]  R. Tausworthe Random Numbers Generated by Linear Recurrence Modulo Two , 1965 .

[13]  Paul Chow,et al.  High-Performance Reconfigurable Hardware Architecture for Restricted Boltzmann Machines , 2010, IEEE Transactions on Neural Networks.

[14]  Jenq-Neng Hwang,et al.  Finite Precision Error Analysis of Neural Network Hardware Implementations , 1993, IEEE Trans. Computers.

[15]  Geoffrey E. Hinton,et al.  Gated Softmax Classification , 2010, NIPS.

[16]  K. M. Curtis,et al.  Piecewise linear approximation applied to nonlinear function of a neural network , 1997 .

[17]  Geoffrey E. Hinton,et al.  Modeling pixel means and covariances using factorized third-order boltzmann machines , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[18]  Seul Jung,et al.  Hardware Implementation of a Real-Time Neural Network Controller With a DSP and an FPGA for Nonlinear Systems , 2007, IEEE Transactions on Industrial Electronics.

[19]  Rajat Raina,et al.  Large-scale deep unsupervised learning using graphics processors , 2009, ICML '09.

[20]  Lawrence D. Jackel,et al.  An analog neural network processor with programmable topology , 1991 .

[21]  Fernando Morgado Dias,et al.  Artificial neural networks: a review of commercial hardware , 2004, Eng. Appl. Artif. Intell..