Superneurons: dynamic GPU memory management for training deep neural networks
暂无分享,去创建一个
Zenglin Xu | Wei Wu | Tim Kraska | Shuaiwen Song | Ang Li | Yiyang Zhao | Linnan Wang | Jinmian Ye | Zenglin Xu | Tim Kraska | Wei Wu | Ang Li | Linnan Wang | Jinmian Ye | Yiyang Zhao | S. Song
[1] Babak Hassibi,et al. Second Order Derivatives for Network Pruning: Optimal Brain Surgeon , 1992, NIPS.
[2] John Tran,et al. cuDNN: Efficient Primitives for Deep Learning , 2014, ArXiv.
[3] Tao Wang,et al. Deep learning with COTS HPC systems , 2013, ICML.
[4] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[5] Natalia Gimelshein,et al. vDNN: Virtualized deep neural networks for scalable, memory-efficient neural network design , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[6] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[7] Yi Yang,et al. BLASX: A High Performance Level-3 BLAS Library for Heterogeneous Multi-GPU Computing , 2015, ICS.
[8] George Bosilca,et al. Hierarchical DAG Scheduling for Hybrid Distributed Systems , 2015, 2015 IEEE International Parallel and Distributed Processing Symposium.
[9] Mohak Shah,et al. Comparative Study of Caffe, Neon, Theano, and Torch for Deep Learning , 2015, ArXiv.
[10] Simon Haykin,et al. GradientBased Learning Applied to Document Recognition , 2001 .
[11] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[12] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.
[13] Trevor Darrell,et al. Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.
[14] Natalie D. Enright Jerger,et al. Proteus: Exploiting Numerical Precision Variability in Deep Neural Networks , 2016, ICS.
[15] Dumitru Erhan,et al. Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[16] Samy Bengio,et al. Torch: a modular machine learning software library , 2002 .
[17] Yoshua Bengio,et al. Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.
[18] Scott Shenker,et al. Spark: Cluster Computing with Working Sets , 2010, HotCloud.
[19] Marc'Aurelio Ranzato,et al. Large Scale Distributed Deep Networks , 2012, NIPS.
[20] Tianqi Chen,et al. Training Deep Nets with Sublinear Memory Cost , 2016, ArXiv.
[21] Zenglin Xu,et al. Efficient Communications in Training Large Scale Neural Networks , 2017, ACM Multimedia.
[22] P. Cochat,et al. Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.
[23] Yuan Yu,et al. TensorFlow: A system for large-scale machine learning , 2016, OSDI.
[24] Kilian Q. Weinberger,et al. Memory-Efficient Implementation of DenseNets , 2017, ArXiv.
[25] Vincent Vanhoucke,et al. Improving the speed of neural networks on CPUs , 2011 .
[26] Kilian Q. Weinberger,et al. Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[27] Sergey Ioffe,et al. Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning , 2016, AAAI.
[28] Song Han,et al. EIE: Efficient Inference Engine on Compressed Deep Neural Network , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).
[29] Zheng Zhang,et al. MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems , 2015, ArXiv.
[30] Yi Yang,et al. Accelerating Deep Neural Network Training with Inconsistent Stochastic Gradient Descent , 2016, Neural Networks.