Follow
Zixuan Ma
Zixuan Ma
Verified email at mails.tsinghua.edu.cn
Title
Cited by
Cited by
Year
Glm-130b: An open bilingual pre-trained model
A Zeng, X Liu, Z Du, Z Wang, H Lai, M Ding, Z Yang, Y Xu, W Zheng, X Xia, ...
arXiv preprint arXiv:2210.02414, 2022
2602022
{PET}: Optimizing tensor programs with partially equivalent transformations and automated corrections
H Wang, J Zhai, M Gao, Z Ma, S Tang, L Zheng, Y Li, K Rong, Y Chen, ...
15th USENIX Symposium on Operating Systems Design and Implementation (OSDI …, 2021
542021
BaGuaLu: Targeting Brain Scale Pretrained Models with over 37 Million Cores
Z Ma, J He, J Qiu, H Cao, Y Wang, Z Sun, L Zheng, H Wang, S Tang, ...
Proceedings of the 27th ACM SIGPLAN Symposium on Principles and Practice of …, 2022
362022
Risgraph: A real-time streaming system for evolving graphs to support sub-millisecond per-update analysis at millions ops/s
G Feng, Z Ma, D Li, S Chen, X Zhu, W Han, W Chen
Proceedings of the 2021 International Conference on Management of Data, 513-527, 2021
272021
Scaling graph traversal to 281 trillion edges with 40 million cores
H Cao, Y Wang, H Wang, H Lin, Z Ma, W Yin, W Chen
Proceedings of the 27th ACM SIGPLAN Symposium on Principles and Practice of …, 2022
142022
{SmartMoE}: Efficiently Training {Sparsely-Activated} Models through Combining Offline and Online Parallelization
M Zhai, J He, Z Ma, Z Zong, R Zhang, J Zhai
2023 USENIX Annual Technical Conference (USENIX ATC 23), 961-975, 2023
102023
TriCache: A user-transparent block cache enabling high-performance out-of-core processing with in-memory programs
G Feng, H Cao, X Zhu, B Yu, Y Wang, Z Ma, S Chen, W Chen
ACM Transactions on Storage 19 (2), 1-30, 2023
92023
UniQ: a unified programming model for efficient quantum circuit simulation
C Zhang, H Wang, Z Ma, L Xie, Z Song, J Zhai
SC22: International Conference for High Performance Computing, Networking …, 2022
52022
Scaling graph 500 SSSP to 140 trillion edges with over 40 million cores
Y Wang, H Cao, Z Ma, W Yin, W Chen
2022 SC22: International Conference for High Performance Computing …, 2022
42022
高效训练百万亿参数预训练模型的系统挑战和对策
马子轩, 翟季冬, 韩文弢
中兴通讯技术 28 (2), 51-58, 2022
32022
{EINNET}: Optimizing Tensor Programs with {Derivation-Based} Transformations
L Zheng, H Wang, J Zhai, M Hu, Z Ma, T Wang, S Huang, X Miao, S Tang, ...
17th USENIX Symposium on Operating Systems Design and Implementation (OSDI …, 2023
22023
OLLIE: Derivation-based tensor program optimizer
L Zheng, H Wang, J Zhai, M Hu, Z Ma, T Wang, S Tang, L Xie, K Huang, ...
arXiv preprint arXiv:2208.02025, 2022
22022
Optimizing dnns with partially equivalent transformations and automated corrections
H Wang, J Zhai, M Gao, F Zhang, T Wang, Z Ma, S Tang, L Zheng, ...
IEEE Transactions on Computers, 2023
12023
Unified Programming Models for Heterogeneous High-Performance Computers
ZX Ma, YY Jin, SZ Tang, HJ Wang, WC Xue, JD Zhai, WM Zheng
Journal of Computer Science and Technology 38 (1), 211-218, 2023
12023
Efficiently emulating high-bitwidth computation with low-bitwidth hardware
Z Ma, H Wang, G Feng, C Zhang, L Xie, J He, S Chen, J Zhai
Proceedings of the 36th ACM International Conference on Supercomputing, 1-12, 2022
12022
Efficient Asynchronous Performance Prediction for Heterogeneous Systems
Y JIN, Z MA, J ZHAI
Chinese Journal of Computational Physics 41 (1), 40, 2024
2024
异步感知的异构高性能计算机性能预测方法
金煜阳, 马子轩, 翟季冬
计算物理 41 (1), 40, 2024
2024
PowerFusion: A Tensor Compiler with Explicit Data Movement Description and Instruction-level Graph IR
Z Ma, H Wang, J Xing, L Zheng, C Zhang, H Cao, K Huang, S Tang, ...
arXiv preprint arXiv:2307.04995, 2023
2023
面向异构高性能计算机的统一编程模型
马子轩, 金煜阳, 唐适之, 王豪杰, 薛伟诚, 翟季冬, 郑纬民
计算机科学技术学报 38 (1), 211-218, 2023
2023
面向新一代神威超级计算机的高效内存分配器
王豪杰, 马子轩, 郑立言, 王元炜, 王飞, 翟季冬
清华大学学报 (自然科学版), 2022
2022
The system can't perform the operation now. Try again later.
Articles 1–20