{ROLLER}: Fast and efficient tensor compilation for deep learning H Zhu, R Wu, Y Diao, S Ke, H Li, C Zhang, J Xue, L Ma, Y Xia, W Cui, ... 16th USENIX Symposium on Operating Systems Design and Implementation (OSDI …, 2022 | 40 | 2022 |
HyQuas: hybrid partitioner based quantum circuit simulation system on GPU C Zhang, Z Song, H Wang, K Rong, J Zhai Proceedings of the ACM International Conference on Supercomputing, 443-454, 2021 | 16 | 2021 |
FreeTensor: a free-form DSL with holistic optimizations for irregular tensor programs S Tang, J Zhai, H Wang, L Jiang, L Zheng, Z Yuan, C Zhang Proceedings of the 43rd ACM SIGPLAN International Conference on Programming …, 2022 | 8 | 2022 |
PerFlow: A domain specific framework for automatic performance analysis of parallel applications Y Jin, H Wang, R Zhong, C Zhang, J Zhai Proceedings of the 27th ACM SIGPLAN Symposium on Principles and Practice of …, 2022 | 7 | 2022 |
UniQ: a unified programming model for efficient quantum circuit simulation C Zhang, H Wang, Z Ma, L Xie, Z Song, J Zhai SC22: International Conference for High Performance Computing, Networking …, 2022 | 5 | 2022 |
Cocktailer: Analyzing and optimizing dynamic control flow in deep learning C Zhang, L Ma, J Xue, Y Shi, Z Miao, F Yang, J Zhai, Z Yang, M Yang 17th USENIX Symposium on Operating Systems Design and Implementation (OSDI …, 2023 | 3 | 2023 |
Critique of “Planetary Normal Mode Computation: Parallel Algorithms, Performance, and Reproducibility” by SCC Team From Tsinghua University C Zhang, C Zhao, J He, S Chen, L Zheng, K Huang, W Han, J Zhai IEEE Transactions on Parallel and Distributed Systems 32 (11), 2631-2634, 2021 | 2 | 2021 |
Efficiently emulating high-bitwidth computation with low-bitwidth hardware Z Ma, H Wang, G Feng, C Zhang, L Xie, J He, S Chen, J Zhai Proceedings of the 36th ACM International Conference on Supercomputing, 1-12, 2022 | 1 | 2022 |
Critique of “MemXCT: memory-centric X-ray CT reconstruction with massive parallelization” by SCC Team from Tsinghua University R Zhong, J Chen, C Zhang, M Zhai, Z Song, Y Wang, W Han, L Gan, ... IEEE Transactions on Parallel and Distributed Systems 33 (9), 2050-2053, 2021 | | 2021 |
A Fast Lock for Explicit Message Passing Architectures X Tang, C Zhang, J Zhai, X Qian, W Chen, Y Jiang IEEE Transactions on Computers 70 (10), 1555-1568, 2020 | | 2020 |