AWB-GCN: A graph convolutional network accelerator with runtime workload rebalancing T Geng, A Li, R Shi, C Wu, T Wang, Y Li, P Haghi, A Tumeo, S Che, ... 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture …, 2020 | 247 | 2020 |
FPDeep: Acceleration and load balancing of CNN training on FPGA clusters T Geng, T Wang, A Sanaullah, C Yang, R Xu, R Patel, M Herbordt 2018 IEEE 26th Annual International Symposium on Field-Programmable Custom …, 2018 | 101 | 2018 |
I-GCN: A graph convolutional network accelerator with runtime locality enhancement through islandization T Geng, C Wu, Y Zhang, C Tan, C Xie, H You, M Herbordt, Y Lin, A Li MICRO-54: 54th annual IEEE/ACM international symposium on microarchitecture …, 2021 | 86 | 2021 |
Accelerating Transformer-based Deep Learning Models on FPGAs using Column Balanced Block Pruning H Peng, S Huang, T Geng, A Li, W Jiang, H Liu, S Wang, C Ding The 22nd International Symposium on Quality Electronic Design, 1-8, 2021 | 76 | 2021 |
A framework for acceleration of CNN training on deeply-pipelined FPGA clusters with work and weight load balancing T Geng, T Wang, A Sanaullah, C Yang, R Patel, M Herbordt 2018 28th international conference on field programmable logic and …, 2018 | 74 | 2018 |
FPDeep: Scalable Acceleration of CNN Training on Deeply-Pipelined FPGA Clusters T Geng, T Wang, A Li, X Jin, M Herbordt IEEE Transactions on Computers, 2020 | 56* | 2020 |
Fully integrated FPGA molecular dynamics simulations C Yang, T Geng, T Wang, R Patel, Q Xiong, A Sanaullah, C Wu, J Sheng, ... Proceedings of the International Conference for High Performance Computing …, 2019 | 53 | 2019 |
LP-BNN: Ultra-low-latency BNN inference with layer parallelism T Geng, T Wang, C Wu, C Yang, SL Song, A Li, M Herbordt 2019 IEEE 30th International Conference on Application-specific Systems …, 2019 | 44 | 2019 |
Tripartite feature enhanced pyramid network for dense prediction D Liu, J Liang, T Geng, A Loui, T Zhou IEEE Transactions on Image Processing, 2023 | 39 | 2023 |
Gcod: Graph convolutional network acceleration via dedicated algorithm and accelerator co-design H You, T Geng, Y Zhang, A Li, Y Lin 2022 IEEE International Symposium on High-Performance Computer Architecture …, 2022 | 38 | 2022 |
BSTC: A novel binarized-soft-tensor-core design for accelerating bit-based approximated neural nets A Li, T Geng, T Wang, M Herbordt, SL Song, K Barker Proceedings of the international conference for high performance computing …, 2019 | 38 | 2019 |
A length adaptive algorithm-hardware co-design of transformer on fpga through sparse attention and dynamic pipelining H Peng, S Huang, S Chen, B Li, T Geng, A Li, W Jiang, W Wen, J Bi, H Liu, ... Proceedings of the 59th ACM/IEEE Design Automation Conference, 1135-1140, 2022 | 37 | 2022 |
Ghostsz: A transparent fpga-accelerated lossy compression framework Q Xiong, R Patel, C Yang, T Geng, A Skjellum, MC Herbordt 2019 IEEE 27th Annual International Symposium on Field-Programmable Custom …, 2019 | 35 | 2019 |
O3BNN-R: An out-of-order architecture for high-performance and regularized BNN inference T Geng, A Li, T Wang, C Wu, Y Li, R Shi, W Wu, M Herbordt IEEE Transactions on Parallel and Distributed Systems 32 (1), 199-213, 2021 | 34 | 2021 |
Apnn-tc: Accelerating arbitrary precision neural networks on ampere gpu tensor cores B Feng, Y Wang, T Geng, A Li, Y Ding Proceedings of the international conference for high performance computing …, 2021 | 29 | 2021 |
O3BNN: an out-of-order architecture for high-performance binarized neural network inference with fine-grained pruning T Geng, T Wang, C Wu, C Yang, W Wu, A Li, MC Herbordt Proceedings of the ACM International Conference on Supercomputing (ICS), 461-472, 2019 | 28 | 2019 |
Towards sparsification of graph neural networks H Peng, D Gurevin, S Huang, T Geng, W Jiang, O Khan, C Ding 2022 IEEE 40th International Conference on Computer Design (ICCD), 272-279, 2022 | 25 | 2022 |
FPGAs in the network and novel communicator support accelerate MPI collectives P Haghi, A Guo, Q Xiong, R Patel, C Yang, T Geng, JT Broaddus, ... 2020 IEEE High Performance Extreme Computing Conference (HPEC), 1-10, 2020 | 24 | 2020 |
Dissecting tensor cores via microbenchmarks: Latency, throughput and numeric behaviors W Sun, A Li, T Geng, S Stuijk, H Corporaal IEEE Transactions on Parallel and Distributed Systems 34 (1), 246-261, 2022 | 23 | 2022 |
G-CoS: GNN-accelerator co-search towards both better accuracy and efficiency Y Zhang, H You, Y Fu, T Geng, A Li, Y Lin 2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD), 1-9, 2021 | 23 | 2021 |