Cross-modal background suppression for audio-visual event localization Y Xia, Z Zhao Proceedings of the IEEE/CVF conference on computer vision and pattern …, 2022 | 36 | 2022 |
Video-guided curriculum learning for spoken video grounding Y Xia, Z Zhao, S Ye, Y Zhao, H Li, Y Ren Proceedings of the 30th ACM International Conference on Multimedia, 5191-5200, 2022 | 6 | 2022 |
Scene-robust natural language video localization via learning domain-invariant representations Z Wang, Y Zhao, H Huang, Y Xia, Z Zhao Findings of the Association for Computational Linguistics: ACL 2023, 144-160, 2023 | 5 | 2023 |
Achieving Cross Modal Generalization with Multimodal Unified Representation Y Xia, H Huang, J Zhu, Z Zhao Advances in Neural Information Processing Systems 36, 2024 | 2 | 2024 |
Cross-modal Prompts: Adapting Large Pre-trained Models for Audio-Visual Downstream Tasks H Duan, Y Xia, M Zhou, L Tang, J Zhu, Z Zhao NeurIPS 2023, 2023 | 2 | 2023 |
StyleSinger: Style Transfer for Out-of-Domain Singing Voice Synthesis Y Zhang, R Huang, R Li, JZ He, Y Xia, F Chen, X Duan, B Huai, Z Zhao Proceedings of the AAAI Conference on Artificial Intelligence 38 (17), 19597 …, 2024 | 1 | 2024 |
Unlocking the Potential of Multimodal Unified Discrete Representation through Training-Free Codebook Optimization and Hierarchical Alignment H Huang, Y Xia, S Ji, S Wang, H Wang, J Zhu, Z Dong, Z Zhao arXiv preprint arXiv:2403.05168, 2024 | | 2024 |