Follow
Yupan Huang
Title
Cited by
Cited by
Year
LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking
Y Huang, T Lv, L Cui, Y Lu, F Wei
Proceedings of the 30th ACM International Conference on Multimedia, 2022
2622022
Seeing out of the box: End-to-end pre-training for vision-language representation learning
Z Huang*, Z Zeng*, Y Huang*, B Liu, D Fu, J Fu
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2021
2372021
Probing inter-modality: Visual parsing with self-attention for vision-and-language pre-training
H Xue, Y Huang, B Liu, H Peng, J Fu, H Li, J Luo
Advances in Neural Information Processing Systems 34, 4514-4528, 2021
752021
Unifying multimodal transformer for bi-directional image and text generation
Y Huang, H Xue, B Liu, Y Lu
Proceedings of the 29th ACM International Conference on Multimedia, 1138-1147, 2021
542021
Decoupling localization and classification in single shot temporal action detection
Y Huang, Q Dai, Y Lu
2019 IEEE International Conference on Multimedia and Expo (ICME), 1288-1293, 2019
542019
Reinforced short-length hashing
X Liu, X Nie, Q Dai, Y Huang, L Lian, Y Yin
IEEE Transactions on Circuits and Systems for Video Technology 31 (9), 3655-3668, 2020
212020
TextDiffuser: Diffusion Models as Text Painters
J Chen*, Y Huang*, T Lv, L Cui, Q Chen, F Wei
NeurIPS, 2023
152023
Kosmos-2.5: A Multimodal Literate Model
T Lv*, Y Huang*, J Chen*, L Cui*, S Ma, Y Chang, S Huang, W Wang, ...
arXiv preprint arXiv:2309.11419, 2023
142023
Sparkles: Unlocking Chats Across Multiple Images for Multimodal Instruction-Following Models
Y Huang, Z Meng, F Liu, Y Su, N Collier, Y Lu
arXiv preprint arXiv:2308.16463, 2023
112023
A picture is worth a thousand words: A unified system for diverse captions and rich images generation
Y Huang, B Liu, J Fu, Y Lu
Proceedings of the 29th ACM International Conference on Multimedia, 2792-2794, 2021
82021
Be specific, be clear: Bridging machine and human captions by scene-guided transformer
Y Huang, Z Zeng, Y Lu
Proceedings of the 2021 Workshop on Multi-Modal Pre-Training for Multimedia …, 2021
52021
TextDiffuser-2: Unleashing the Power of Language Models for Text Rendering
J Chen, Y Huang, T Lv, L Cui, Q Chen, F Wei
arXiv preprint arXiv:2311.16465, 2023
32023
The system can't perform the operation now. Try again later.
Articles 1–12