Efficient inter-node MPI communication using GPUDirect RDMA for InfiniBand clusters with NVIDIA GPUs S Potluri, K Hamidouche, A Venkatesh, D Bureddy, DK Panda 2013 42nd International Conference on Parallel Processing, 80-89, 2013 | 179 | 2013 |
Scalable hierarchical aggregation protocol (SHArP): A hardware architecture for efficient data reduction RL Graham, D Bureddy, P Lui, H Rosenstock, G Shainer, G Bloch, ... 2016 First International Workshop on Communication Optimizations in HPC …, 2016 | 142 | 2016 |
GPU-aware MPI on RDMA-enabled clusters: Design, implementation and evaluation H Wang, S Potluri, D Bureddy, C Rosales, DK Panda IEEE Transactions on Parallel and Distributed Systems 25 (10), 2595-2605, 2013 | 116 | 2013 |
Optimizing MPI communication on multi-GPU systems using CUDA inter-process communication S Potluri, H Wang, D Bureddy, AK Singh, C Rosales, DK Panda 2012 IEEE 26th International Parallel and Distributed Processing Symposium …, 2012 | 108 | 2012 |
Omb-gpu: A micro-benchmark suite for evaluating mpi libraries on gpu clusters D Bureddy, H Wang, A Venkatesh, S Potluri, DK Panda Recent Advances in the Message Passing Interface: 19th European MPI Users …, 2012 | 66 | 2012 |
MVAPICH-PRISM: A proxy-based communication framework using InfiniBand and SCIF for Intel MIC clusters S Potluri, D Bureddy, K Hamidouche, A Venkatesh, K Kandalla, ... Proceedings of the International Conference on High Performance Computing …, 2013 | 52 | 2013 |
Efficient intra-node communication on intel-mic clusters S Potluri, A Venkatesh, D Bureddy, K Kandalla, DK Panda 2013 13th IEEE/ACM International Symposium on Cluster, Cloud, and Grid …, 2013 | 49 | 2013 |
Extending openSHMEM for GPU computing S Potluri, D Bureddy, H Wang, H Subramoni, DK Panda 2013 IEEE 27th International Symposium on Parallel and Distributed …, 2013 | 44 | 2013 |
Designing optimized mpi broadcast and allreduce for many integrated core (mic) infiniband clusters K Kandalla, A Venkatesh, K Hamidouche, S Potluri, D Bureddy, DK Panda 2013 IEEE 21st Annual Symposium on High-Performance Interconnects, 63-70, 2013 | 33 | 2013 |
Intra-MIC MPI communication using MVAPICH2: Early experience S Potluri, K Tomko, D Bureddy, DK Panda TACC-Intel Highly-Parallel Computing Symposium, 2012 | 21 | 2012 |
Design of network topology aware scheduling services for large infiniband clusters H Subramoni, D Bureddy, K Kandalla, K Schulz, B Barth, J Perkins, ... 2013 IEEE International Conference on Cluster Computing (CLUSTER), 1-8, 2013 | 18 | 2013 |
Mvapich2-mic: A high performance mpi library for xeon phi clusters with infiniband S Potluri, K Hamidouche, D Bureddy, DK Panda 2013 Extreme Scaling Workshop (xsw 2013), 25-32, 2013 | 18 | 2013 |
Efficient intranode desgins for openshmem on multicore clusters S Potluri, K Kandalla, D Bureddy, M Li, DK Panda The 6th Conference on Partitioned Global Address Space, PGAS, 2012 | 10 | 2012 |
Towards a data centric system architecture: Sharp R Graham, G Bloch, D Bureddy, G Shainer, B Smith Supercomputing Frontiers and Innovations 4 (4), 4-16, 2017 | 5 | 2017 |
Design and implementation of key proposed mpi-3 one-sided communication semantics on infiniband S Potluri, S Sur, D Bureddy, DK Panda Recent Advances in the Message Passing Interface: 18th European MPI Users …, 2011 | 4 | 2011 |
2014 Index IEEE Transactions on Parallel and Distributed Systems Vol. 25 M Abdelhakim, T Abe, D Abramson, N Abu-Ghazaleh, ME Acacio, ... IEEE Transactions on Parallel and Distributed Systems 26 (1), 291, 2015 | | 2015 |
COMHPC 2016 RL Graham, D Bureddy, P Lui, H Rosenstock, G Shainer, G Bloch, ... | | |