Mask-align: Self-supervised neural word alignment C Chen, M Sun, Y Liu arXiv preprint arXiv:2012.07162, 2020 | 29 | 2020 |
Position-enhanced visual instruction tuning for multimodal large language models C Chen, R Qin, F Luo, X Mi, P Li, M Sun, Y Liu arXiv preprint arXiv:2308.13437, 2023 | 24 | 2023 |
Filling the image information gap for vqa: Prompting large language models to proactively ask questions Z Wang, C Chen, P Li, Y Liu arXiv preprint arXiv:2311.11598, 2023 | 6 | 2023 |
End-to-End Unsupervised Vision-and-Language Pre-training with Referring Expression Matching C Chen, P Li, M Sun, Y Liu Proceedings of the 2022 Conference on Empirical Methods in Natural Language …, 2022 | 4 | 2022 |
CODIS: Benchmarking Context-Dependent Visual Comprehension for Multimodal Large Language Models F Luo*, C Chen*, Z Wan, Z Kang, Q Yan, Y Li, X Wang, S Wang, Z Wang, ... arXiv preprint arXiv:2402.13607, 2024 | | 2024 |
Model Composition for Multimodal Large Language Models C Chen, Y Du, Z Fang, Z Wang, F Luo, P Li, M Yan, J Zhang, F Huang, ... arXiv preprint arXiv:2402.12750, 2024 | | 2024 |
Browse and Concentrate: Comprehending Multimodal Content via prior-LLM Context Fusion Z Wang*, C Chen*, Y Zhu, F Luo, P Li, M Yan, J Zhang, F Huang, M Sun, ... arXiv preprint arXiv:2402.12195, 2024 | | 2024 |
Weakly Supervised Vision-and-Language Pre-training with Relative Representations C Chen, P Li, M Sun, Y Liu arXiv preprint arXiv:2305.15483, 2023 | | 2023 |