O3BNN-R: An Out-of-Order Architecture for High-Performance and Regularized BNN Inference
暂无分享,去创建一个
Wei Wu | Ang Li | Runbin Shi | Martin Herbordt | Tianqi Wang | Chunshu Wu | Tong Geng | Yanfei Li | Wei Wu | Tong Geng | Ang Li | Tianqi Wang | Chunshu Wu | Yanfei Li | Runbin Shi | M. Herbordt
[1] R. M. Tomasulo,et al. An efficient algorithm for exploiting multiple arithmetic units , 1995 .
[2] David A. Patterson,et al. Computer Architecture: A Quantitative Approach , 1969 .
[3] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[4] Ming Yang,et al. 3D Convolutional Neural Networks for Human Action Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[5] Fei-Fei Li,et al. Large-Scale Video Classification with Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.
[6] Yoshua Bengio,et al. BinaryConnect: Training Deep Neural Networks with binary weights during propagations , 2015, NIPS.
[7] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[8] Timo Aila,et al. Pruning Convolutional Neural Networks for Resource Efficient Transfer Learning , 2016, ArXiv.
[9] Ran El-Yaniv,et al. Binarized Neural Networks , 2016, ArXiv.
[10] Ran El-Yaniv,et al. Binarized Neural Networks , 2016, NIPS.
[11] Manoj Alwani,et al. Fused-layer CNN accelerators , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[12] Ali Farhadi,et al. XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks , 2016, ECCV.
[13] Eriko Nurvitadhi,et al. Accelerating Binarized Neural Networks: Comparison of FPGA, CPU, GPU, and ASIC , 2016, 2016 International Conference on Field-Programmable Technology (FPT).
[14] Chen Yang,et al. Novo-G#: Large-scale reconfigurable computing with direct and programmable interconnects , 2016, 2016 IEEE High Performance Extreme Computing Conference (HPEC).
[15] Vivienne Sze,et al. Designing Energy-Efficient Convolutional Neural Networks Using Energy-Aware Pruning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[16] Philip Heng Wai Leong,et al. FINN: A Framework for Fast, Scalable Binarized Neural Network Inference , 2016, FPGA.
[17] Gang Hua,et al. How to Train a Compact Binary Neural Network with High Accuracy? , 2017, AAAI.
[18] Rajesh Gupta,et al. Accelerating Binarized Convolutional Neural Networks with Software-Programmable FPGAs , 2017, FPGA.
[19] Peng Zhang,et al. Automated systolic array architecture synthesis for high throughput CNN inference on FPGAs , 2017, 2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC).
[20] Ran El-Yaniv,et al. Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations , 2016, J. Mach. Learn. Res..
[21] Shengen Yan,et al. Evaluating Fast Algorithms for Convolutional Neural Networks on FPGAs , 2017, 2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM).
[22] Xiangyu Zhang,et al. Channel Pruning for Accelerating Very Deep Neural Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[23] Chen Yang,et al. FPDeep: Acceleration and Load Balancing of CNN Training on FPGA Clusters , 2018, 2018 IEEE 26th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM).
[24] Hiroki Nakahara,et al. A Threshold Neuron Pruning for a Binarized Deep Neural Network on an FPGA , 2018, IEICE Trans. Inf. Syst..
[25] Farinaz Koushanfar,et al. ReBNet: Residual Binarized Neural Network , 2017, 2018 IEEE 26th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM).
[26] Jiangming Jin,et al. BitFlow: Exploiting Vector Parallelism for Binary Neural Networks on CPU , 2018, 2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS).
[27] Kenneth O'Brien,et al. FINN-R: An End-to-End Deep-Learning Framework for Fast Exploration of Quantized Neural Networks , 2018 .
[28] Jing Li,et al. Adaptive Quantization of Neural Networks , 2018, ICLR.
[29] Wayne Luk,et al. FP-BNN: Binarized neural network on FPGA , 2018, Neurocomputing.
[30] Jiayi Sheng,et al. High Performance Communication on Reconfigurable Clusters , 2018, 2018 28th International Conference on Field Programmable Logic and Applications (FPL).
[31] Martin C. Herbordt,et al. BSTC: a novel binarized-soft-tensor-core design for accelerating bit-based approximated neural nets , 2019, SC.
[32] Martin C. Herbordt,et al. O3BNN: an out-of-order architecture for high-performance binarized neural network inference with fine-grained pruning , 2019, ICS.
[33] Tianqi Wang,et al. LP-BNN: Ultra-low-Latency BNN Inference with Layer Parallelism , 2019, 2019 IEEE 30th International Conference on Application-specific Systems, Architectures and Processors (ASAP).
[34] Martin C. Herbordt,et al. FPDeep: Scalable Acceleration of CNN Training on Deeply-Pipelined FPGA Clusters , 2020, IEEE Transactions on Computers.
[35] Antonino Tumeo,et al. AWB-GCN: A Graph Convolutional Network Accelerator with Runtime Workload Rebalancing , 2019, 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[36] C. Meinel,et al. MeliusNet: Can Binary Neural Networks Achieve MobileNet-level Accuracy? , 2020, ArXiv.