Follow
Jie Lei 雷杰
Jie Lei 雷杰
Research Scientist, Meta AI
Verified email at fb.com - Homepage
Title
Cited by
Cited by
Year
Less is More: ClipBERT for Video-and-Language Learning via Sparse Sampling
J Lei*, L Li*, L Zhou, Z Gan, TL Berg, M Bansal, J Liu
CVPR 2021, Best Student Paper Honorable Mention, 2021
7082021
TVQA: Localized, compositional video question answering
J Lei, L Yu, M Bansal, TL Berg
EMNLP 2018, 2018
6772018
Unifying vision-and-language tasks via text generation
J Cho, J Lei, H Tan, M Bansal
ICML 2021, 2021
5382021
Tvr: A large-scale dataset for video-subtitle moment retrieval
J Lei, L Yu, TL Berg, M Bansal
ECCV 2020, 2020
2742020
TVQA+: Spatio-temporal grounding for video question answering
J Lei, L Yu, TL Berg, M Bansal
ACL 2020, 2020
2512020
MART: Memory-Augmented Recurrent Transformer for Coherent Video Paragraph Captioning
J Lei, L Wang, Y Shen, D Yu, TL Berg, M Bansal
ACL 2020, 2020
2072020
QVHighlights: Detecting Moments and Highlights in Videos via Natural Language Queries
J Lei, TL Berg, M Bansal
NeurIPS 2021, 2021
202*2021
Language Models with Image Descriptors are Strong Few-Shot Video-Language Learners
Z Wang, M Li, R Xu, L Zhou, J Lei, X Lin, S Wang, Z Yang, C Zhu, ...
NeurIPS 2022, 2022
1162022
Revealing single frame bias for video-and-language learning
J Lei, TL Berg, M Bansal
ACL 2023, 2022
1142022
VALUE: A Multi-Task Benchmark for Video-and-Language Understanding Evaluation
L Li*, J Lei*, Z Gan, L Yu, YC Chen, R Pillai, Y Cheng, L Zhou, XE Wang, ...
NeurIPS 2021 Datasets and Benchmarks Track, 2021
1132021
VIMPAC: Video Pre-Training via Masked Token Prediction and Contrastive Learning
H Tan*, J Lei*, T Wolf, M Bansal
CVPR 2022 workshop on Transformers for Vision, 2021
832021
VindLU: A Recipe for Effective Video-and-Language Pretraining
F Cheng, X Wang, J Lei, D Crandall, M Bansal, G Bertasius
CVPR 2023, 2022
742022
Adversarial VQA: A New Benchmark for Evaluating the Robustness of VQA Models
L Li, J Lei, Z Gan, J Liu
ICCV 2021, 2021
722021
DeCEMBERT: Learning from Noisy Instructional Videos via Dense Captions and Entropy Minimization
Z Tang*, J Lei*, M Bansal
NAACL 2021, 2021
672021
What is More Likely to Happen Next? Video-and-Language Future Event Prediction
J Lei, L Yu, TL Berg, M Bansal
EMNLP 2020, 2020
672020
Vision Transformers are Parameter-Efficient Audio-Visual Learners
YB Lin, YL Sung, J Lei, M Bansal, G Bertasius
CVPR 2023, 2022
642022
RESIN-11: Schema-guided event prediction for 11 newsworthy scenarios
X Du, Z Zhang, S Li, P Yu, H Wang, T Lai, X Lin, Z Wang, I Liu, B Zhou, ...
Proceedings of the 2022 Conference of the North American Chapter of the …, 2022
372022
ECLIPSE: Efficient Long-range Video Retrieval using Sight and Sound
YB Lin, J Lei, M Bansal, G Bertasius
ECCV 2022 Oral, 2022
372022
Weakly supervised image classification with coarse and fine labels
J Lei, Z Guo, Y Wang
2017 14th conference on computer and robot vision (crv), 240-247, 2017
252017
Loopitr: Combining dual and cross encoder architectures for image-text retrieval
J Lei, X Chen, N Zhang, M Wang, M Bansal, TL Berg, L Yu
arXiv preprint arXiv:2203.05465, 2022
142022
The system can't perform the operation now. Try again later.
Articles 1–20