Masked contrastive pre-training for efficient video-text retrieval F Shu, B Chen, Y Liao, S Xiao, W Sun, X Li, Y Zhu, J Wang, S Liu arXiv preprint arXiv:2212.00986, 2022 | 6 | 2022 |
Audio-Visual LLM for Video Understanding F Shu, L Zhang, H Jiang, C Xie arXiv preprint arXiv:2312.06720, 2023 | 3 | 2023 |
HyperLLaVA: Dynamic Visual and Language Expert Tuning for Multimodal Large Language Models W Zhang, T Lin, J Liu, F Shu, H Li, L Zhang, H Wanggui, H Zhou, Z Lv, ... arXiv preprint arXiv:2403.13447, 2024 | 1 | 2024 |
Multiple Transformer Mining for VizWiz Image Caption X Gong, H Zhu, Y Wang, B Chen, A Zhang, F Shu, S Liu 2021 VizWiz Grand Challenge Workshop, 2021 | 1 | 2021 |
Compress & Align: Curating Image-Text Data with Human Knowledge L Zhang, F Shu, S Ren, B Zhao, H Jiang, C Xie arXiv preprint arXiv:2312.06726, 2023 | | 2023 |