HAN: a Hierarchical AutotuNed Collective Communication Framework
暂无分享,去创建一个
Wei Wu | George Bosilca | Jack Dongarra | Xi Luo | Qinglei Cao | Dong Zhong | Yu Pei | Thananon Patinyasakdikul | J. Dongarra | G. Bosilca | Wei Wu | Qinglei Cao | Yu Pei | Dong Zhong | Xi Luo | Thananon Patinyasakdikul
[1] Kevin Harms,et al. Characterization of MPI Usage on a Production Supercomputer , 2018, SC18: International Conference for High Performance Computing, Networking, Storage and Analysis.
[2] Sathish S. Vadhiyar,et al. Automatically Tuned Collective Communications , 2000, ACM/IEEE SC 2000 Conference (SC'00).
[3] Dhabaleswar K. Panda,et al. Design of a scalable InfiniBand topology service to enable network-topology-aware placement of processes , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.
[4] William Gropp,et al. Modeling MPI Communication Performance on SMP Nodes: Is it Time to Retire the Ping Pong Test , 2016, EuroMPI.
[5] Hao Zhu,et al. Hierarchical Collectives in MPICH2 , 2009, PVM/MPI.
[6] Ramesh Subramonian,et al. LogP: towards a realistic model of parallel computation , 1993, PPOPP '93.
[7] Vijay S. Pai,et al. Exploiting Process Imbalance to Improve MPI Collective Operations in Hierarchical Systems , 2015, ICS.
[8] Dhabaleswar K. Panda,et al. Fast collective operations using shared and remote memory access protocols on clusters , 2003, Proceedings International Parallel and Distributed Processing Symposium.
[9] Dhabaleswar K. Panda,et al. SALaR: Scalable and Adaptive Designs for Large Message Reduction Collectives , 2018, 2018 IEEE International Conference on Cluster Computing (CLUSTER).
[10] Roger W. Hockney,et al. The Communication Challenge for MPP: Intel Paragon and Meiko CS-2 , 1994, Parallel Computing.
[11] Jack J. Dongarra,et al. Performance analysis of MPI collective operations , 2005, 19th IEEE International Parallel and Distributed Processing Symposium.
[12] Xin Yuan,et al. Automatic generation and tuning of MPI collective communication routines , 2005, ICS '05.
[13] Xin Yuan,et al. STAR-MPI: self tuned adaptive routines for MPI collective operations , 2006, ICS '06.
[14] Massimo Maresca,et al. Polymorphic-Torus Network , 1989, IEEE Trans. Computers.
[15] Jesper Larsson Träff,et al. Two-tree algorithms for full bandwidth broadcast, reduction and scan , 2009, Parallel Comput..
[16] Alexander Sergeev,et al. Horovod: fast and easy distributed deep learning in TensorFlow , 2018, ArXiv.
[17] Chris J. Scheiman,et al. LogGP: incorporating long messages into the LogP model—one step closer towards a realistic model for parallel computation , 1995, SPAA '95.
[18] Brice Goglin,et al. KNEM: A generic and scalable kernel-assisted intra-node MPI communication framework , 2013, J. Parallel Distributed Comput..
[19] Torsten Hoefler,et al. Using performance models to understand scalable Krylov solver performance at scale for structured grid problems , 2019, ICS.
[20] Robert A. van de Geijn,et al. Collective communication: theory, practice, and experience: Research Articles , 2007 .
[21] G. Fagg,et al. Flexible collective communication tuning architecture applied to Open MPI , 2006 .
[22] Bronis R. de Supinski,et al. Exploiting hierarchy in parallel computer networks to optimize collective operation performance , 2000, Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000.
[23] Henri E. Bal,et al. MagPIe: MPI's collective communication operations for clustered wide area systems , 1999, PPoPP '99.
[24] Jack J. Dongarra,et al. Decision Trees and MPI Collective Algorithm Selection Problem , 2007, Euro-Par.
[25] Jesper Larsson Träff,et al. MPI Collectives and Datatypes for Hierarchical All-to-all Communication , 2014, EuroMPI/ASIA.
[26] Armin R. Mikler,et al. Net-PIPE: Network Protocol Independent Performance Evaluator , 1997 .
[27] Manjunath Gorentla Venkata,et al. Cheetah: A Framework for Scalable Hierarchical Collective Operations , 2011, 2011 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing.
[28] George Bosilca,et al. HierKNEM: An Adaptive Framework for Kernel-Assisted and Topology-Aware Collective Communications on Many-core Clusters , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium.
[29] Charles E. Leiserson,et al. Fat-trees: Universal networks for hardware-efficient supercomputing , 1985, IEEE Transactions on Computers.
[30] George Bosilca,et al. ADAPT: an event-based adaptive collective communication framework , 2018, HPDC.
[31] Torsten Hoefler,et al. Implementation and performance analysis of non-blocking collective operations for MPI , 2007, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).
[32] Dhabaleswar K. Panda,et al. Scalable Reduction Collectives with Data Partitioning-based Multi-Leader Design , 2017, SC17: International Conference for High Performance Computing, Networking, Storage and Analysis.
[33] Kees Verstoep,et al. Fast Measurement of LogP Parameters for Message Passing Platforms , 2000, IPDPS Workshops.
[34] Dhabaleswar K. Panda,et al. Designing topology-aware collective communication algorithms for large scale InfiniBand clusters: Case studies with Scatter and Gather , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW).
[35] Dharma P. Agrawal,et al. Generalized Hypercube and Hyperbus Structures for a Computer Network , 1984, IEEE Transactions on Computers.
[36] Jack J. Dongarra,et al. MPI Collective Algorithm Selection and Quadtree Encoding , 2006, PVM/MPI.