Qwen2-vl: Enhancing vision-language model's perception of the world at any resolution P Wang, S Bai, S Tan, S Wang, Z Fan, J Bai, K Chen, X Liu, J Wang, W Ge, ... arXiv preprint arXiv:2409.12191, 2024 | 24 | 2024 |
Reform-eval: Evaluating large vision language models via unified re-formulation of task-oriented benchmarks Z Li, Y Wang, M Du, Q Liu, B Wu, J Zhang, C Zhou, Z Fan, J Fu, J Chen, ... arXiv preprint arXiv:2310.02569, 2023 | 3 | 2023 |