Image as a foreign language: Beit pretraining for vision and vision-language tasks W Wang, H Bao, L Dong, J Bjorck, Z Peng, Q Liu, K Aggarwal, ... Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2023 | 624* | 2023 |
Vlmo: Unified vision-language pre-training with mixture-of-modality-experts H Bao, W Wang, L Dong, Q Liu, OK Mohammed, K Aggarwal, S Som, ... Advances in Neural Information Processing Systems 35, 32897-32912, 2022 | 351 | 2022 |
Language is not all you need: Aligning perception with language models S Huang, L Dong, W Wang, Y Hao, S Singhal, S Ma, T Lv, L Cui, ... Advances in Neural Information Processing Systems 36, 2024 | 325* | 2024 |
Grains: Generative recursive autoencoders for indoor scenes M Li, AG Patil, K Xu, S Chaudhuri, O Khan, A Shamir, C Tu, B Chen, ... ACM Transactions on Graphics (TOG) 38 (2), 1-16, 2019 | 200 | 2019 |
ArK: Augmented Reality with Knowledge Interactive Emergent Ability Q Huang, JS Park, A Gupta, P Bennett, R Gong, S Som, B Peng, ... arXiv preprint arXiv:2305.00970, 2023 | 2 | 2023 |
Bootstrapping a high quality multilingual multimodal dataset for Bletchley OK Mohammed, K Aggarwal, Q Liu, S Singhal, J Bjorck, S Som Asian Conference on Machine Learning, 738-753, 2023 | 2 | 2023 |
DUBLIN: Visual Document Understanding By Language-Image Network K Aggarwal, A Khandelwal, K Tanmay, OK Mohammed, Q Liu, ... Proceedings of the 2023 Conference on Empirical Methods in Natural Language …, 2023 | 1* | 2023 |