Peihao Chen

Cited by

	All	Since 2019
Citations	1195	1193
h-index	12	12
i10-index	12	12

400

200

100

300

2019202020212022202320248 85 222 318 397 162

Public access

View all

10 articles

0 articles

available

not available

Based on funding mandates

Co-authors

Chuang GanUMass Amherst | MIT-IBM Watson AI LabVerified email at csail.mit.edu
Mingkui TanSouth China University of TechnologyVerified email at scut.edu.cn
Runhao ZengShenzhen MSU-BIT University, Tenure-track Associate ProfessorVerified email at smbu.edu.cn
Wenbing HuangAssociate Professor, Renmin University of ChinaVerified email at ruc.edu.cn
Antonio TorralbaProfessor of Computer Science, MITVerified email at csail.mit.edu
David CoxVP, AI Models; IBM Director, MIT-IBM Watson AI Lab, IBM ResearchVerified email at ibm.com
Hang ZhaoAssistant Professor, Tsinghua UniversityVerified email at csail.mit.edu
Joshua B. TenenbaumMITVerified email at mit.edu
Qingyao WuSchool of Software Engineering, South China University of TechnologyVerified email at scut.edu.cn
Guangyao ShenTsinghua UniversityVerified email at mails.tsinghua.edu.cn

Peihao Chen

Ph.D. candidate, South China University of Technology, UMass Amherst

Verified email at umass.edu - Homepage

Embodied AI Multi-Modal Video Understanding


Title Sort by citations Sort by year Sort by title	Cited by Cited by	Year
Dense regression network for video grounding R Zeng, H Xu, W Huang, P Chen, M Tan, C Gan Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2020	263	2020
Location-aware graph convolutional networks for video question answering D Huang, P Chen, R Zeng, Q Du, M Tan, C Gan Proceedings of the AAAI Conference on Artificial Intelligence 34 (07), 11021 …, 2020	174	2020
Self-supervised moving vehicle tracking with stereo sound C Gan, H Zhao, P Chen, D Cox, A Torralba Proceedings of the IEEE/CVF international conference on computer vision …, 2019	159	2019
Foley music: Learning to generate music from videos C Gan, D Huang, P Chen, JB Tenenbaum, A Torralba Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23 …, 2020	118	2020
RSPNet: Relative Speed Perception for Unsupervised Video Representation Learning P Chen, D Huang, D He, X Long, R Zeng, S Wen, M Tan, C Gan AAAI Conference on Artificial Intelligence, 2021, 2020	106	2020
Breaking winner-takes-all: Iterative-winners-out networks for weakly supervised temporal action localization R Zeng, C Gan, P Chen, W Huang, Q Wu, M Tan IEEE Transactions on Image Processing 28 (12), 5797-5808, 2019	85	2019
Generating visually aligned sound from videos P Chen, Y Zhang, M Tan, H Xiao, D Huang, C Gan IEEE Transactions on Image Processing 29, 8292-8302, 2020	73	2020
Relation attention for temporal action localization P Chen, C Gan, G Shen, W Huang, R Zeng, M Tan IEEE Transactions on Multimedia 22 (10), 2723-2733, 2019	73	2019
3d-llm: Injecting the 3d world into large language models Y Hong, H Zhen, P Chen, S Zheng, Y Du, Z Chen, C Gan Advances in Neural Information Processing Systems 36, 20482-20494, 2023	67*	2023
Weakly-Supervised Multi-Granularity Map Learning for Vision-and-Language Navigation P Chen, D Ji, K Lin, R Zeng, TH Li, M Tan, C Gan NeurIPS 2022, 2022	31	2022
Masked motion encoding for self-supervised video representation learning X Sun, P Chen, L Chen, C Li, TH Li, M Tan, C Gan Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2023	16	2023
Learning Active Camera for Multi-Object Navigation P Chen, D Ji, K Lin, W Hu, W Huang, TH Li, M Tan, C Gan NeurIPS 2022, 2022	14	2022
Vesper: A compact and effective pretrained model for speech emotion recognition W Chen, X Xing, P Chen, X Xu IEEE Transactions on Affective Computing, 2024	6	2024
Nav: Action-Aware Zero-Shot Robot Navigation by Exploiting Vision-and-Language Ability of Foundation Models P Chen, X Sun, H Zhi, R Zeng, TH Li, G Liu, M Tan, C Gan arXiv preprint arXiv:2308.07997, 2023	5	2023
CoVLM: Composing Visual Entities and Relationships in Large Language Models Via Communicative Decoding J Li, D Chen, Y Hong, Z Chen, P Chen, Y Shen, C Gan arXiv preprint arXiv:2311.03354, 2023	2	2023
3D-VLA: A 3D Vision-Language-Action Generative World Model H Zhen, X Qiu, P Chen, J Yang, X Yan, Y Du, Y Hong, C Gan arXiv preprint arXiv:2403.09631, 2024	1	2024
MultiPLY: A Multisensory Object-Centric Embodied Large Language Model in 3D World Y Hong, Z Zheng, P Chen, Y Wang, J Li, C Gan arXiv preprint arXiv:2401.08577, 2024	1	2024
Learning vision-and-language navigation from youtube videos K Lin, P Chen, D Huang, TH Li, M Tan, C Gan Proceedings of the IEEE/CVF International Conference on Computer Vision …, 2023	1	2023
FGPrompt: Fine-grained Goal Prompting for Image-goal Navigation X Sun, P Chen, J Fan, J Chen, T Li, M Tan Advances in Neural Information Processing Systems 36, 2024		2024
A Simple Knowledge Distillation Framework for Open-world Object Detection S Ma, Y Wang, Y Wei, J Fan, X Sun, P Chen, E Zhang arXiv preprint arXiv:2312.08653, 2023		2023

The system can't perform the operation now. Try again later.

Articles 1–20

Citations per year

Duplicate citations

Merged citations

Add co-authorsCo-authors

Follow

Cited by

Co-authors