Automated systolic array architecture synthesis for high throughput CNN inference on FPGAs X Wei, CH Yu, P Zhang, Y Chen, Y Wang, H Hu, Y Liang, J Cong Proceedings of the 54th Annual Design Automation Conference 2017, 1-6, 2017 | 454 | 2017 |
Ansor: Generating {High-Performance} tensor programs for deep learning L Zheng, C Jia, M Sun, Z Wu, CH Yu, A Haj-Ali, Y Wang, J Yang, D Zhuo, ... 14th USENIX symposium on operating systems design and implementation (OSDI …, 2020 | 301 | 2020 |
Efficient memory management for large language model serving with pagedattention W Kwon, Z Li, S Zhuang, Y Sheng, L Zheng, CH Yu, J Gonzalez, H Zhang, ... Proceedings of the 29th Symposium on Operating Systems Principles, 611-626, 2023 | 262 | 2023 |
Programming and runtime support to blaze FPGA accelerator deployment at datacenter scale M Huang, D Wu, CH Yu, Z Fang, M Interlandi, T Condie, J Cong Proceedings of the Seventh ACM Symposium on Cloud Computing, 456-469, 2016 | 109 | 2016 |
HeteroCL: A multi-paradigm programming infrastructure for software-defined reconfigurable computing YH Lai, Y Chi, Y Hu, J Wang, CH Yu, Y Zhou, J Cong, Z Zhang Proceedings of the 2019 ACM/SIGDA International Symposium on Field …, 2019 | 102 | 2019 |
TGPA: tile-grained pipeline architecture for low latency CNN inference X Wei, Y Liang, X Li, CH Yu, P Zhang, J Cong 2018 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), 1-8, 2018 | 74 | 2018 |
Automated accelerator generation and optimization with composable, parallel and pipeline architecture J Cong, P Wei, CH Yu, P Zhang Proceedings of the 55th Annual Design Automation Conference, 1-6, 2018 | 72 | 2018 |
AutoDSE: Enabling software programmers to design efficient FPGA accelerators A Sohrabizadeh, CH Yu, M Gao, J Cong ACM Transactions on Design Automation of Electronic Systems (TODAES) 27 (4 …, 2022 | 65 | 2022 |
Bandwidth Optimization Through On-Chip Memory Restructuring for HLS J Cong, P Wei, CH Yu, P Zhou | 64 | 2017 |
The SMEM Seeding Acceleration for DNA Sequence Alignment MCF Chang, YT Chen, J Cong, PT Huang, CL Kuo, CH Yu The 24th IEEE International Symposium on Field-Programmable Custom Computing …, 2016 | 60 | 2016 |
Systems and methods for systolic array design from a high-level program P Zhang, CH Yu, X Wei, P Pan US Patent 10,838,910, 2020 | 57 | 2020 |
S2FA: An accelerator automation framework for heterogeneous computing in datacenters CH Yu, P Wei, M Grossman, P Zhang, V Sarker, J Cong Proceedings of the 55th Annual Design Automation Conference, 1-6, 2018 | 43 | 2018 |
Tensorir: An abstraction for automatic tensorized program optimization S Feng, B Hou, H Jin, W Lin, J Shao, R Lai, Z Ye, L Zheng, CH Yu, Y Yu, ... Proceedings of the 28th ACM International Conference on Architectural …, 2023 | 38 | 2023 |
On the preconditioner of conjugate gradient method: a power grid simulation perspective CH Chou, NY Tsai, H Yu, CR Lee, Y Shi, SC Chang Proceedings of the International Conference on Computer-Aided Design, 494-497, 2010 | 37 | 2010 |
Best-effort FPGA programming: A few steps can go a long way J Cong, Z Fang, Y Hao, P Wei, CH Yu, C Zhang, P Zhou arXiv preprint arXiv:1807.01340, 2018 | 33 | 2018 |
Heterogeneous datacenters: Options and opportunities J Cong, M Huang, D Wu, CH Yu Proceedings of the 53rd Annual Design Automation Conference, 1-6, 2016 | 28 | 2016 |
DietCode: Automatic optimization for dynamic tensor programs B Zheng, Z Jiang, CH Yu, H Shen, J Fromm, Y Liu, Y Wang, L Ceze, ... Proceedings of Machine Learning and Systems 4, 848-863, 2022 | 24 | 2022 |
Analysis and optimization of the implicit broadcasts in FPGA HLS to improve maximum frequency L Guo, J Lau, Y Chi, J Wang, CH Yu, Z Chen, Z Zhang, J Cong 2020 57th ACM/IEEE Design Automation Conference (DAC), 1-6, 2020 | 24 | 2020 |
Latte: Locality Aware Transformation for High-Level Synthesis J Cong, P Wei, CH Yu, P Zhou | 23 | 2018 |
Useful-skew clock optimization for multi-power mode designs HM Chou, H Yu, SC Chang 2011 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), 647-650, 2011 | 22 | 2011 |