A survey on efficient inference for large language models Z Zhou, X Ning, K Hong, T Fu, J Xu, S Li, Y Lou, L Wang, Z Yuan, X Li, ... arXiv preprint arXiv:2404.14294, 2024 | 10 | 2024 |
Evaluating quantized large language models S Li, X Ning, L Wang, T Liu, X Shi, S Yan, G Dai, H Yang, Y Wang arXiv preprint arXiv:2402.18158, 2024 | 8 | 2024 |
Llm-mq: Mixed-precision quantization for efficient llm deployment S Li, X Ning, K Hong, T Liu, L Wang, X Li, K Zhong, G Dai, H Yang, ... The Efficient Natural Language and Speech Processing Workshop with NeurIPS 9, 2023 | 5 | 2023 |