Follow
Mengfei Du
Title
Cited by
Cited by
Year
Qwen2-vl: Enhancing vision-language model's perception of the world at any resolution
P Wang, S Bai, S Tan, S Wang, Z Fan, J Bai, K Chen, X Liu, J Wang, W Ge, ...
arXiv preprint arXiv:2409.12191, 2024
242024
Reform-eval: Evaluating large vision language models via unified re-formulation of task-oriented benchmarks
Z Li, Y Wang, M Du, Q Liu, B Wu, J Zhang, C Zhou, Z Fan, J Fu, J Chen, ...
arXiv preprint arXiv:2310.02569, 2023
32023
EmbSpatial-Bench: Benchmarking Spatial Understanding for Embodied Tasks with Large Vision-Language Models
M Du, B Wu, Z Li, X Huang, Z Wei
arXiv preprint arXiv:2406.05756, 2024
12024
DELAN: Dual-Level Alignment for Vision-and-Language Navigation by Cross-Modal Contrastive Learning
M Du, B Wu, J Zhang, Z Fan, Z Li, R Luo, X Huang, Z Wei
arXiv preprint arXiv:2404.01994, 2024
12024
The system can't perform the operation now. Try again later.
Articles 1–4