Just ask: Learning to answer questions from millions of narrated videos A Yang, A Miech, J Sivic, I Laptev, C Schmid Proceedings of the IEEE/CVF International Conference on Computer Vision …, 2021 | 265 | 2021 |
NAS evaluation is frustratingly hard A Yang, PM Esperança, FM Carlucci International Conference on Learning Representations, 2020 | 199 | 2020 |
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context M Reid, N Savinov, D Teplyashin, D Lepikhin, T Lillicrap, J Alayrac, ... arXiv preprint arXiv:2403.05530, 2024 | 196 | 2024 |
Zero-Shot Video Question Answering via Frozen Bidirectional Language Models A Yang, A Miech, J Sivic, I Laptev, C Schmid Advances in Neural Information Processing Systems 35, 124-141, 2022 | 168 | 2022 |
Vid2seq: Large-scale pretraining of a visual language model for dense video captioning A Yang, A Nagrani, PH Seo, A Miech, J Pont-Tuset, I Laptev, J Sivic, ... Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2023 | 142 | 2023 |
TubeDETR: Spatio-Temporal Video Grounding with Transformers A Yang, A Miech, J Sivic, I Laptev, C Schmid Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2022 | 82 | 2022 |
MANAS: multi-agent neural architecture search V Lopes, FM Carlucci, P Esperanca, M Singh, A Yang, V Gabillon, H Xu, ... Machine Learning, 1-24, 2023 | 31* | 2023 |
Learning to Answer Visual Questions from Web Videos A Yang, A Miech, J Sivic, I Laptev, C Schmid IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022 | 28 | 2022 |
Covr: Learning composed video retrieval from web video captions L Ventura, A Yang, C Schmid, G Varol Proceedings of the AAAI Conference on Artificial Intelligence 38 (6), 5270-5279, 2024 | 20 | 2024 |
VidChapters-7M: Video Chapters at Scale A Yang, A Nagrani, I Laptev, J Sivic, C Schmid Advances in Neural Information Processing Systems 36, 2023 | 13 | 2023 |
Just ask: Learning to answer questions from millions of narrated videos. 2021 IEEE A Yang, A Miech, J Sivic, I Laptev, C Schmid CVF International Conference on Computer Vision (ICCV), 1666-1677, 2020 | 8 | 2020 |
Learning Visual Language Models for Video Understanding A Yang Ecole Normale Superieure de Paris-ENS Paris, 2023 | | 2023 |
VidChapters-7M: Video Chapters at Scale Supplementary Material A Yang, A Nagrani, I Laptev, J Sivic, C Schmid | | |
Zero-Shot Video Question Answering via Frozen Bidirectional Language Models Supplementary Material A Yang, A Miech, J Sivic, I Laptev, C Schmid | | |
TubeDETR: Spatio-Temporal Video Grounding with Transformers Supplementary Material A Yang, A Miech, J Sivic, I Laptev, C Schmid | | |
Just Ask: Learning to Answer Questions from Millions of Narrated Videos Supplementary Material A Yang, A Miech, J Sivic, I Laptev, C Schmid | | |