Non-Autoregressive Coarse-to-Fine Video Captioning B Yang, Y Zou, F Liu, C Zhang In Proceedings of AAAI 2021, 2021 | 79* | 2021 |
O2NA: An object-oriented non-autoregressive approach for controllable video captioning F Liu, X Ren, X Wu, B Yang, S Ge, Y Zou, X Sun In Findings of ACL 2021, 2021 | 32 | 2021 |
CLIP Meets Video Captioning: Concept-Aware Representation Learning Does Matter B Yang, T Zhang, Y Zou In Proceedings of PRCV 2022 (Oral), 2022 | 23* | 2022 |
Unify, Align and Refine: Multi-Level Semantic Alignment for Radiology Report Generation Y Li, B Yang, X Cheng, Z Zhu, H Li, Y Zou In Proceedings of ICCV 2023, 2023 | 14 | 2023 |
Adaptive curriculum learning for video captioning S Li, B Yang, Y Zou IEEE Access 10, 31751-31759, 2022 | 12 | 2022 |
Retrieve, reason, and refine: Generating accurate and faithful patient instructions F Liu*, B Yang*, C You, X Wu, S Ge, Z Liu, X Sun, Y Yang, D Clifton In Proceedings of NeurIPS 2022, 2022 | 10 | 2022 |
A medical multimodal large language model for future pandemics F Liu, T Zhu, X Wu, B Yang, C You, C Wang, L Lu, Z Liu, Y Zheng, X Sun, ... NPJ Digital Medicine 6 (1), 226, 2023 | 9 | 2023 |
Concept-aware video captioning: Describing videos with effective prior information B Yang, M Cao, Y Zou IEEE Transactions on Image Processing, 2023 | 6 | 2023 |
ZeroNLG: Aligning and Autoencoding Domains for Zero-Shot Multimodal and Multilingual Natural Language Generation B Yang, F Liu, Y Zou, X Wu, Y Wang, DA Clifton arXiv preprint arXiv:2303.06458, 2023 | 5 | 2023 |
Graph-in-graph network for automatic gene ontology description generation F Liu, B Yang, C You, X Wu, S Ge, A Woicik, S Wang In Proceedings of KDD 2022 (Oral), 2022 | 5 | 2022 |
PCLmed at ImageCLEFmedical 2023: Customizing General-Purpose Foundation Models for Medical Report Generation B Yang, A Raza, Y Zou, T Zhang In Proceedings of CLEF 2023, 2023 | 4* | 2023 |
MultiCapCLIP: Auto-Encoding Prompts for Zero-Shot Multilingual Visual Captioning B Yang, F Liu, X Wu, Y Wang, Y Wang, Y Zou In Proceedings of ACL 2023, 2023 | 4 | 2023 |
WorldGPT: a Sora-inspired video AI agent as Rich world models from text and image inputs D Yang, L Hu, Y Tian, Z Li, C Kelly, B Yang, C Yang, Y Zou arXiv preprint arXiv:2403.07944, 2024 | 3 | 2024 |
Visual oriented encoder: Integrating multimodal and multi-scale contexts for video captioning B Yang, Y Zou In Proceedings of ICPR 2020, 188-195, 2021 | 3 | 2021 |
Visiongpt: Vision-language understanding agent using generalized multimodal framework C Kelly, L Hu, B Yang, Y Tian, D Yang, C Yang, Z Huang, Z Li, J Hu, Y Zou arXiv preprint arXiv:2403.09027, 2024 | 2 | 2024 |
Consensus-Guided Keyword Targeting for Video Captioning P Ji, B Yang, T Zhang, Y Zou In Proceedings of PRCV 2022, 2022 | 2 | 2022 |
VisionGPT-3D: A Generalized Multimodal Agent for Enhanced 3D Vision Understanding C Kelly, L Hu, J Hu, Y Tian, D Yang, B Yang, C Yang, Z Li, Z Huang, Y Zou arXiv preprint arXiv:2403.09530, 2024 | 1 | 2024 |
Improving Medical Report Generation with Adapter Tuning and Knowledge Enhancement in Vision-Language Foundation Models S Wu, B Yang, Z Ye, H Wang, H Zheng, T Zhang arXiv preprint arXiv:2312.03970, 2023 | 1 | 2023 |
UnifiedVisionGPT: Streamlining Vision-Oriented AI through Generalized Multimodal Framework C Kelly, L Hu, C Yang, Y Tian, D Yang, B Yang, Z Huang, Z Li, Y Zou arXiv preprint arXiv:2311.10125, 2023 | 1 | 2023 |
Multimodal Prompt Learning for Product Title Generation with Extremely Limited Labels B Yang, F Liu, Z Li, Q Yin, C You, B Yin, Y Zou In Findings of ACL 2023, 2023 | 1 | 2023 |