A survey on efficient inference for large language models Z Zhou, X Ning, K Hong, T Fu, J Xu, S Li, Y Lou, L Wang, Z Yuan, X Li, ... arXiv preprint arXiv:2404.14294, 2024 | 4 | 2024 |
Evaluating Quantized Large Language Models S Li, X Ning, L Wang, T Liu, X Shi, S Yan, G Dai, H Yang, Y Wang arXiv preprint arXiv:2402.18158, 2024 | 3 | 2024 |