Accelerating Stochastic Gradient Descent Based Matrix Factorization on FPGA
暂无分享,去创建一个
[1] Rajesh Gupta,et al. Accelerating Binarized Convolutional Neural Networks with Software-Programmable FPGAs , 2017, FPGA.
[2] Nancy M. Amato,et al. Faster Parallel Traversal of Scale Free Graphs at Extreme Scale with Vertex Delegates , 2014, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis.
[3] James C. Hoe,et al. GraphGen: An FPGA Framework for Vertex-Centric Graph Computation , 2014, 2014 IEEE 22nd Annual International Symposium on Field-Programmable Custom Computing Machines.
[4] Viktor K. Prasanna,et al. High-Throughput and Energy-Efficient Graph Processing on FPGA , 2016, 2016 IEEE 24th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM).
[5] Inderjit S. Dhillon,et al. Parallel matrix factorization for recommender systems , 2014, Knowl. Inf. Syst..
[6] Chih-Jen Lin,et al. A Fast Parallel Stochastic Gradient Method for Matrix Factorization in Shared Memory Systems , 2015, ACM Trans. Intell. Syst. Technol..
[7] Vaclav Petricek,et al. Recommender System for Online Dating Service , 2007, ArXiv.
[8] Keshav Pingali,et al. Stochastic gradient descent on GPUs , 2015, GPGPU@PPoPP.
[9] Kathryn Fraughnaugh,et al. Introduction to graph theory , 1973, Mathematical Gazette.
[10] Viktor K. Prasanna,et al. Optimizing memory performance for FPGA implementation of pagerank , 2015, 2015 International Conference on ReConFigurable Computing and FPGAs (ReConFig).
[11] Liana L. Fong,et al. Faster and Cheaper: Parallelizing Large-Scale Matrix Factorization on GPUs , 2016, HPDC.
[12] Margaret Martonosi,et al. Graphicionado: A high-performance and energy-efficient accelerator for graph analytics , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[13] Darshika G. Perera,et al. An efficient embedded multi-ported memory architecture for next-generation FPGAs , 2017, 2017 IEEE 28th International Conference on Application-specific Systems, Architectures and Processors (ASAP).
[14] Chen Yang,et al. FPDeep: Acceleration and Load Balancing of CNN Training on FPGA Clusters , 2018, 2018 IEEE 26th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM).
[15] Weimin Zheng,et al. Exploring the Hidden Dimension in Graph Processing , 2016, OSDI.
[16] Tom Feist,et al. Vivado Design Suite , 2012 .
[17] Viktor K. Prasanna,et al. Sketch Acceleration on FPGA and its Applications in Network Anomaly Detection , 2018, IEEE Transactions on Parallel and Distributed Systems.
[18] Haibo Chen,et al. Bipartite-Oriented Distributed Graph Partitioning for Big Learning , 2014, Journal of Computer Science and Technology.
[19] Miriam Leeser,et al. Scaling Neural Network Performance through Customized Hardware Architectures on Reconfigurable Logic , 2017, 2017 IEEE International Conference on Computer Design (ICCD).
[20] Marc'Aurelio Ranzato,et al. Large Scale Distributed Deep Networks , 2012, NIPS.
[21] Léon Bottou,et al. Stochastic Gradient Descent Tricks , 2012, Neural Networks: Tricks of the Trade.
[22] Viktor K. Prasanna,et al. Accelerating low rank matrix completion on FPGA , 2017, 2017 International Conference on ReConFigurable Computing and FPGAs (ReConFig).
[23] Dong Yu,et al. On parallelizability of stochastic gradient descent for speech DNNS , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[24] Yun Liang,et al. CuMF_SGD: Parallelized Stochastic Gradient Descent for Matrix Factorization on GPUs , 2017, HPDC.
[25] Kunle Olukotun,et al. Understanding and optimizing asynchronous low-precision stochastic gradient descent , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).
[26] Jason Cong,et al. Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks , 2015, FPGA.
[27] Thomas B. Preußer,et al. Inference of quantized neural networks on heterogeneous all-programmable devices , 2018, 2018 Design, Automation & Test in Europe Conference & Exhibition (DATE).
[28] J. Gregory Steffan,et al. Multi-ported memories for FPGAs via XOR , 2012, FPGA '12.
[29] Jeffrey Pennington,et al. GloVe: Global Vectors for Word Representation , 2014, EMNLP.
[30] Keqin Li,et al. MSGD: A Novel Matrix Factorization Approach for Large-Scale Collaborative Filtering Recommender Systems on GPUs , 2018, IEEE Transactions on Parallel and Distributed Systems.
[31] Philip Heng Wai Leong,et al. Kibo: An Open-Source Fixed-Point Tool-kit for Training and Inference in FPGA-Based Deep Learning Networks , 2018, 2018 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).
[32] Arda Yurdakul,et al. Efficient Implementations of Multi-pumped Multi-port Register Files in FPGAs , 2013, 2013 Euromicro Conference on Digital System Design.
[33] Inderjit S. Dhillon,et al. A Scalable Asynchronous Distributed Algorithm for Topic Modeling , 2014, WWW.
[34] Yu Wang,et al. A Reconfigurable Computing Approach for Efficient and Scalable Parallel Graph Exploration , 2012, 2012 IEEE 23rd International Conference on Application-Specific Systems, Architectures and Processors.
[35] Pradeep Dubey,et al. GraphMat: High performance graph analytics made productive , 2015, Proc. VLDB Endow..
[36] Yehuda Koren,et al. Matrix Factorization Techniques for Recommender Systems , 2009, Computer.
[37] J. Gregory Steffan,et al. Efficient multi-ported memories for FPGAs , 2010, FPGA '10.
[38] Viktor K. Prasanna,et al. Accelerating Large-Scale Single-Source Shortest Path on FPGA , 2015, 2015 IEEE International Parallel and Distributed Processing Symposium Workshop.
[39] Viktor K. Prasanna,et al. FASTCF: FPGA-based Accelerator for STochastic-Gradient-Descent-based Collaborative Filtering , 2018, FPGA.
[40] Pradeep Dubey,et al. Navigating the maze of graph analytics frameworks using massive graph datasets , 2014, SIGMOD Conference.