Follow
Haojie Wang
Haojie Wang
Verified email at tsinghua.edu.cn
Title
Cited by
Cited by
Year
{PET}: Optimizing tensor programs with partially equivalent transformations and automated corrections
H Wang, J Zhai, M Gao, Z Ma, S Tang, L Zheng, Y Li, K Rong, Y Chen, ...
15th USENIX Symposium on Operating Systems Design and Implementation (OSDI …, 2021
542021
BaGuaLu: targeting brain scale pretrained models with over 37 million cores
Z Ma, J He, J Qiu, H Cao, Y Wang, Z Sun, L Zheng, H Wang, S Tang, ...
Proceedings of the 27th ACM SIGPLAN Symposium on Principles and Practice of …, 2022
362022
Fastermoe: modeling and optimizing training of large-scale dynamic pre-trained models
J He, J Zhai, T Antunes, H Wang, F Luo, S Shi, Q Li
Proceedings of the 27th ACM SIGPLAN Symposium on Principles and Practice of …, 2022
342022
Spindle: Informed memory access monitoring
H Wang, J Zhai, X Tang, B Yu, X Ma, W Chen
2018 USENIX Annual Technical Conference (USENIX ATC 18), 561-574, 2018
202018
HyQuas: hybrid partitioner based quantum circuit simulation system on GPU
C Zhang, Z Song, H Wang, K Rong, J Zhai
Proceedings of the ACM International Conference on Supercomputing, 443-454, 2021
162021
Scaling graph traversal to 281 trillion edges with 40 million cores
H Cao, Y Wang, H Wang, H Lin, Z Ma, W Yin, W Chen
Proceedings of the 27th ACM SIGPLAN Symposium on Principles and Practice of …, 2022
142022
Spread-n-share: improving application performance and cluster throughput with resource-aware job placement
X Tang, H Wang, X Ma, N El-Sayed, J Zhai, W Chen, A Aboulnaga
Proceedings of the International Conference for High Performance Computing …, 2019
142019
: Large-Scale Graph Triangle Counting on a Single Machine Using GPUs
J Huang, H Wang, X Fei, X Wang, W Chen
IEEE Transactions on Parallel and Distributed Systems 33 (11), 3067-3078, 2021
92021
ScalAna: Automating scaling loss detection with graph analysis
Y Jin, H Wang, T Yu, X Tang, T Hoefler, X Liu, J Zhai
SC20: International Conference for High Performance Computing, Networking …, 2020
92020
FreeTensor: a free-form DSL with holistic optimizations for irregular tensor programs
S Tang, J Zhai, H Wang, L Jiang, L Zheng, Z Yuan, C Zhang
Proceedings of the 43rd ACM SIGPLAN International Conference on Programming …, 2022
82022
PerFlow: A domain specific framework for automatic performance analysis of parallel applications
Y Jin, H Wang, R Zhong, C Zhang, J Zhai
Proceedings of the 27th ACM SIGPLAN Symposium on Principles and Practice of …, 2022
72022
LotusSQL: SQL engine for high-performance big data systems
X Li, B Yu, G Feng, H Wang, W Chen
Big Data Mining and Analytics 4 (4), 252-265, 2021
72021
UniQ: a unified programming model for efficient quantum circuit simulation
C Zhang, H Wang, Z Ma, L Xie, Z Song, J Zhai
SC22: International Conference for High Performance Computing, Networking …, 2022
52022
Vapro: Performance variance detection and diagnosis for production-run parallel applications
L Zheng, J Zhai, X Tang, H Wang, T Yu, Y Jin, SL Song, W Chen
Proceedings of the 27th ACM SIGPLAN Symposium on Principles and Practice of …, 2022
42022
OLLIE: Derivation-based tensor program optimizer
L Zheng, H Wang, J Zhai, M Hu, Z Ma, T Wang, S Tang, L Xie, K Huang, ...
arXiv preprint arXiv:2208.02025, 2022
22022
Identifying scalability bottlenecks for large-scale parallel programs with graph analysis
Y Jin, H Wang, X Tang, T Hoefler, X Liu, J Zhai
Proceedings of the 25th ACM SIGPLAN Symposium on Principles and Practice of …, 2020
22020
Unified Programming Models for Heterogeneous High-Performance Computers
ZX Ma, YY Jin, SZ Tang, HJ Wang, WC Xue, JD Zhai, WM Zheng
Journal of Computer Science and Technology 38 (1), 211-218, 2023
12023
An Efficient Sparse CNNs Accelerator on FPGA
Y Zhang, H Jiang, X Li, H Wang, D Dong, Y Cao
2022 IEEE International Conference on Cluster Computing (CLUSTER), 504-505, 2022
12022
Efficiently emulating high-bitwidth computation with low-bitwidth hardware
Z Ma, H Wang, G Feng, C Zhang, L Xie, J He, S Chen, J Zhai
Proceedings of the 36th ACM International Conference on Supercomputing, 1-12, 2022
12022
Detecting performance variance for parallel applications without source code
J Zhai, L Zheng, F Zhang, X Tang, H Wang, T Yu, Y Jin, SL Song, W Chen
IEEE Transactions on Parallel and Distributed Systems 33 (12), 4239-4255, 2022
12022
The system can't perform the operation now. Try again later.
Articles 1–20