Using deepspeed and megatron to train megatron-turing nlg 530b, a large-scale generative language model S Smith, M Patwary, B Norick, P LeGresley, S Rajbhandari, J Casper, ... arXiv preprint arXiv:2201.11990, 2022 | 615 | 2022 |
Efficient large-scale language model training on gpu clusters using megatron-lm D Narayanan, M Shoeybi, J Casper, P LeGresley, M Patwary, ... Proceedings of the International Conference for High Performance Computing …, 2021 | 596 | 2021 |
Reducing activation recomputation in large transformer models VA Korthikanti, J Casper, S Lym, L McAfee, M Andersch, M Shoeybi, ... Proceedings of Machine Learning and Systems 5, 341-353, 2023 | 178 | 2023 |
Synthesizing geometry constructions S Gulwani, VA Korthikanti, A Tiwari ACM SIGPLAN Notices 46 (6), 50-61, 2011 | 175 | 2011 |
Rewon Child, Reza Yazdani Aminabadi, Julie Bernauer, Xia Song, Mohammad Shoeybi, Yuxiong He, Michael Houston, Saurabh Tiwary, and Bryan Catanzaro S Smith, M Patwary, B Norick, P LeGresley, S Rajbhandari, J Casper, ... Using deepspeed and megatron to train megatron-turing nlg 530b, a large …, 2022 | 137 | 2022 |
Towards optimizing energy costs of algorithms for shared memory architectures VA Korthikanti, G Agha Proceedings of the twenty-second annual ACM symposium on Parallelism in …, 2010 | 74 | 2010 |
Reasoning about MDPs as transformers of probability distributions VA Korthikanti, M Viswanathan, G Agha, YM Kwon 2010 Seventh International Conference on the Quantitative Evaluation of …, 2010 | 56 | 2010 |
Analysis of parallel algorithms for energy conservation in scalable multicore architectures VA Korthikanti, G Agha 2009 International Conference on Parallel Processing, 212-219, 2009 | 56 | 2009 |
Model checking MDPs with a unique compact invariant set of distributions R Chadha, VA Korthikanti, M Viswanathan, G Agha, YM Kwon 2011 Eighth International Conference on Quantitative Evaluation of SysTems …, 2011 | 25 | 2011 |
An Empirical Study of Mamba-based Language Models R Waleffe, W Byeon, D Riach, B Norick, V Korthikanti, T Dao, A Gu, ... arXiv preprint arXiv:2406.07887, 2024 | 22 | 2024 |
Re-vilm: Retrieval-augmented visual language model for zero and few-shot image captioning Z Yang, W Ping, Z Liu, V Korthikanti, W Nie, DA Huang, L Fan, Z Yu, S Lan, ... arXiv preprint arXiv:2302.04858, 2023 | 22 | 2023 |
Using deepspeed and megatron to train megatron-turing nlg 530b, a large-scale generative language model. arXiv 2022 S Smith, M Patwary, B Norick, P LeGresley, S Rajbhandari, J Casper, ... arXiv preprint arXiv:2201.11990, 0 | 21 | |
Fair k mutual exclusion algorithm for peer to peer systems VA Reddy, P Mittal, I Gupta 2008 The 28th International Conference on Distributed Computing Systems, 655-662, 2008 | 20 | 2008 |
Using deepspeed and megatron to train megatron-turing nlg 530b, a large-scale generative language model. arXiv S Smith, M Patwary, B Norick, P LeGresley, S Rajbhandari, J Casper, ... Preprint published online January 28, 2022 | 14 | 2022 |
Energy-performance trade-off analysis of parallel algorithms VA Korthikanti, G Agha USENIX Workshop on Hot Topics in Parallelism (HotPar), 2010 | 13 | 2010 |
Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B S Smith, M Patwary, B Norick, P LeGresley, S Rajbhandari, J Casper, ... A large-scale generative language model, 2022 | 12 | 2022 |
On the energy complexity of parallel algorithms VA Korthikanti, G Agha, M Greenstreet 2011 International Conference on Parallel Processing, 562-570, 2011 | 11 | 2011 |
Avoiding energy wastage in parallel applications VA Korthikanti, G Agha International Conference on Green Computing, 149-163, 2010 | 11 | 2010 |
An efficient algorithm to reduce test power consumption by scan cell and scan vector reordering KVA Reddy, S Chattopadahyay Proceedings of the IEEE INDICON 2004. First India Annual Conference, 2004 …, 2004 | 10 | 2004 |
Energy bounded scalability analysis of parallel algorithms VA Korthikanti, GA Agha | 9 | 2009 |