Glm-130b: An open bilingual pre-trained model A Zeng, X Liu, Z Du, Z Wang, H Lai, M Ding, Z Yang, Y Xu, W Zheng, X Xia, ... arXiv preprint arXiv:2210.02414, 2022 | 260 | 2022 |
{PET}: Optimizing tensor programs with partially equivalent transformations and automated corrections H Wang, J Zhai, M Gao, Z Ma, S Tang, L Zheng, Y Li, K Rong, Y Chen, ... 15th USENIX Symposium on Operating Systems Design and Implementation (OSDI …, 2021 | 54 | 2021 |
BaGuaLu: Targeting Brain Scale Pretrained Models with over 37 Million Cores Z Ma, J He, J Qiu, H Cao, Y Wang, Z Sun, L Zheng, H Wang, S Tang, ... Proceedings of the 27th ACM SIGPLAN Symposium on Principles and Practice of …, 2022 | 36 | 2022 |
Risgraph: A real-time streaming system for evolving graphs to support sub-millisecond per-update analysis at millions ops/s G Feng, Z Ma, D Li, S Chen, X Zhu, W Han, W Chen Proceedings of the 2021 International Conference on Management of Data, 513-527, 2021 | 27 | 2021 |
Scaling graph traversal to 281 trillion edges with 40 million cores H Cao, Y Wang, H Wang, H Lin, Z Ma, W Yin, W Chen Proceedings of the 27th ACM SIGPLAN Symposium on Principles and Practice of …, 2022 | 14 | 2022 |
{SmartMoE}: Efficiently Training {Sparsely-Activated} Models through Combining Offline and Online Parallelization M Zhai, J He, Z Ma, Z Zong, R Zhang, J Zhai 2023 USENIX Annual Technical Conference (USENIX ATC 23), 961-975, 2023 | 10 | 2023 |
TriCache: A user-transparent block cache enabling high-performance out-of-core processing with in-memory programs G Feng, H Cao, X Zhu, B Yu, Y Wang, Z Ma, S Chen, W Chen ACM Transactions on Storage 19 (2), 1-30, 2023 | 9 | 2023 |
UniQ: a unified programming model for efficient quantum circuit simulation C Zhang, H Wang, Z Ma, L Xie, Z Song, J Zhai SC22: International Conference for High Performance Computing, Networking …, 2022 | 5 | 2022 |
Scaling graph 500 SSSP to 140 trillion edges with over 40 million cores Y Wang, H Cao, Z Ma, W Yin, W Chen 2022 SC22: International Conference for High Performance Computing …, 2022 | 4 | 2022 |
高效训练百万亿参数预训练模型的系统挑战和对策 马子轩, 翟季冬, 韩文弢 中兴通讯技术 28 (2), 51-58, 2022 | 3 | 2022 |
{EINNET}: Optimizing Tensor Programs with {Derivation-Based} Transformations L Zheng, H Wang, J Zhai, M Hu, Z Ma, T Wang, S Huang, X Miao, S Tang, ... 17th USENIX Symposium on Operating Systems Design and Implementation (OSDI …, 2023 | 2 | 2023 |
OLLIE: Derivation-based tensor program optimizer L Zheng, H Wang, J Zhai, M Hu, Z Ma, T Wang, S Tang, L Xie, K Huang, ... arXiv preprint arXiv:2208.02025, 2022 | 2 | 2022 |
Optimizing dnns with partially equivalent transformations and automated corrections H Wang, J Zhai, M Gao, F Zhang, T Wang, Z Ma, S Tang, L Zheng, ... IEEE Transactions on Computers, 2023 | 1 | 2023 |
Unified Programming Models for Heterogeneous High-Performance Computers ZX Ma, YY Jin, SZ Tang, HJ Wang, WC Xue, JD Zhai, WM Zheng Journal of Computer Science and Technology 38 (1), 211-218, 2023 | 1 | 2023 |
Efficiently emulating high-bitwidth computation with low-bitwidth hardware Z Ma, H Wang, G Feng, C Zhang, L Xie, J He, S Chen, J Zhai Proceedings of the 36th ACM International Conference on Supercomputing, 1-12, 2022 | 1 | 2022 |
Efficient Asynchronous Performance Prediction for Heterogeneous Systems Y JIN, Z MA, J ZHAI Chinese Journal of Computational Physics 41 (1), 40, 2024 | | 2024 |
异步感知的异构高性能计算机性能预测方法 金煜阳, 马子轩, 翟季冬 计算物理 41 (1), 40, 2024 | | 2024 |
PowerFusion: A Tensor Compiler with Explicit Data Movement Description and Instruction-level Graph IR Z Ma, H Wang, J Xing, L Zheng, C Zhang, H Cao, K Huang, S Tang, ... arXiv preprint arXiv:2307.04995, 2023 | | 2023 |
面向异构高性能计算机的统一编程模型 马子轩, 金煜阳, 唐适之, 王豪杰, 薛伟诚, 翟季冬, 郑纬民 计算机科学技术学报 38 (1), 211-218, 2023 | | 2023 |
面向新一代神威超级计算机的高效内存分配器 王豪杰, 马子轩, 郑立言, 王元炜, 王飞, 翟季冬 清华大学学报 (自然科学版), 2022 | | 2022 |