Chat-3D: Data-efficiently Tuning Large Language Model for Universal Dialogue of 3D Scenes Z Wang*, H Huang*, Y Zhao, Z Zhang, Z Zhao arXiv preprint arXiv:2308.08769, 2023 | 12 | 2023 |
Connecting Multi-modal Contrastive Representations Z Wang, Y Zhao, X Cheng, H Huang, J Liu, L Tang, L Li, Y Wang, A Yin, ... NeurIPS 2023, 2023 | 7 | 2023 |
Scene-robust Natural Language Video Localization via Learning Domain-invariant Representations Z Wang, Y Zhao, H Huang, Y Xia, Z Zhao ACL 2023, 2023 | 5 | 2023 |
Distilling Coarse-to-fine Semantic Matching Knowledge for Weakly Supervised 3D Visual Grounding Z Wang*, H Huang*, Y Zhao, L Li, X Cheng, Y Zhu, A Yin, Z Zhao ICCV 2023, 2023 | 5 | 2023 |
Towards Effective Multi-modal Interchanges in Zero-resource Sounding Object Localization Y Zhao*, C Zhang*, H Huang*, H Li, Z Zhao NeurIPS 2022, 2022 | 5 | 2022 |
3DRP-Net: 3D Relative Position-aware Network for 3D Visual Grounding Z Wang, H Huang, Y Zhao, L Li, X Cheng, Y Zhu, A Yin, Z Zhao EMNLP 2023, 2023 | 4 | 2023 |
Extending multi-modal contrastive representations Z Wang, Z Zhang, L Liu, Y Zhao, H Huang, T Jin, Z Zhao arXiv preprint arXiv:2310.08884, 2023 | 1 | 2023 |
Multi-Modal Domain Adaptation Across Video Scenes for Temporal Video Grounding H Huang, Y Zhao, Z Wang, Y Xia, Z Zhao arXiv preprint arXiv:2312.13633, 2023 | | 2023 |
Chat-3D v2: Bridging 3D Scene and Large Language Models with Object Identifiers H Huang, Z Wang, R Huang, L Liu, X Cheng, Y Zhao, T Jin, Z Zhao arXiv preprint arXiv:2312.08168, 2023 | | 2023 |