Auto-captions on GIF: A large-scale video-sentence dataset for vision-language pre-training Y Pan, Y Li, J Luo, J Xu, T Yao, T Mei Proceedings of the 30th ACM International Conference on Multimedia, 7070-7074, 2022 | 63 | 2022 |
CoCo-BERT: Improving video-language pre-training with contrastive cross-modal matching and denoising J Luo, Y Li, Y Pan, T Yao, H Chao, T Mei Proceedings of the 29th ACM International Conference on Multimedia, 5600-5608, 2021 | 38 | 2021 |
Semantic-conditional diffusion networks for image captioning J Luo, Y Li, Y Pan, T Yao, J Feng, H Chao, T Mei Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2023 | 37 | 2023 |
Boosting vision-and-language navigation with direction guiding and backtracing J Chen, J Luo, Y Pan, Y Li, T Yao, H Chao, T Mei ACM Transactions on Multimedia Computing, Communications and Applications 19 …, 2023 | 3 | 2023 |