Xiangyu QI

Cited by

	All	Since 2019
Citations	572	572
h-index	9	9
i10-index	9	9

400

200

100

300

20212022202320242 32 150 385

Public access

View all

3 articles

0 articles

available

not available

Based on funding mandates

Co-authors

Prateek MittalProfessor, Princeton UniversityVerified email at princeton.edu
Peter HendersonPrinceton UniversityVerified email at princeton.edu
Saeed MahloujifarFAIR, MetaVerified email at meta.com
Mengdi WangCenter for Statistics & Machine Learning, ECE, Princeton UniversityVerified email at princeton.edu
Pin-Yu ChenPrincipal Research Scientist, IBM Research AI; MIT-IBM Watson AI Lab; RPI-IBM AIRCVerified email at ibm.com
Ruoxi JiaAssistant Professor, Virginia TechVerified email at vt.edu
Kai BuZhejiang UniversityVerified email at zju.edu.cn
Bo LiUniversity of Illinois at Urbana–ChampaignVerified email at illinois.edu
Nezihe Merve GürelAssistant Professor at TU DelftVerified email at stanford.edu
Ce ZhangTogether AI; University of ChicagoVerified email at together.xyz
Chaowei XiaoUniversity of Wisconsin - MadisonVerified email at umich.edu
Ahmad BeiramiGoogle DeepMindVerified email at google.com
Xiao MaGoogleVerified email at cornell.edu
Dawn SongProfessor of Computer Science, UC BerkeleyVerified email at cs.berkeley.edu
Danqi ChenPrinceton UniversityVerified email at cs.princeton.edu

Xiangyu QI

Princeton University

Verified email at princeton.edu - Homepage


Title Sort by citations Sort by year Sort by title	Cited by Cited by	Year
Fine-tuning aligned language models compromises safety, even when users do not intend to! X Qi, Y Zeng, T Xie, PY Chen, R Jia, P Mittal, P Henderson International Conference on Learning Representations (ICLR), 2024 (Oral), 2023	189	2023
Visual Adversarial Examples Jailbreak Aligned Large Language Models X Qi, K Huang, A Panda, P Henderson, M Wang, P Mittal AAAI Conference on Artificial Intelligence, 2024 (Oral), 2023	106*	2023
Revisiting the assumption of latent separability for backdoor defenses X Qi, T Xie, Y Li, S Mahloujifar, P Mittal International Conference on Learning Representations (ICLR), 2023, 2023	81*	2023
Towards Practical Deployment-Stage Backdoor Attack on Deep Neural Networks X Qi, T Xie, R Pan, J Zhu, Y Yang, K Bu Conference on Computer Vision and Pattern Recognition (CVPR), 2022 (Oral), 2021	54	2021
Knowledge Enhanced Machine Learning Pipeline against Diverse Adversarial Attacks NM Gürel, X Qi, L Rimanic, C Zhang, B Li International Conference on Machine Learning (ICML), 2021, 2021	35	2021
Towards A Proactive {ML} Approach for Detecting Backdoor Poison Samples X Qi, T Xie, JT Wang, T Wu, S Mahloujifar, P Mittal 32nd USENIX Security Symposium (USENIX Security 23), 1685-1702, 2023	29*	2023
Assessing the brittleness of safety alignment via pruning and low-rank modifications B Wei, K Huang, Y Huang, T Xie, X Qi, M Xia, P Mittal, M Wang, ... International Conference on Machine Learning (ICML), 2024, 2024	27	2024
Subnet Replacement: Deployment-stage backdoor attack against deep neural networks in gray-box setting X Qi, J Zhu, C Xie, Y Yang ICLR Workshop, 2021	25	2021
Mitigating fine-tuning jailbreak attack with backdoor enhanced alignment J Wang, J Li, Y Li, X Qi, M Chen, J Hu, Y Li, B Li, C Xiao arXiv preprint arXiv:2402.14968, 2024	12	2024
BaDExpert: Extracting Backdoor Functionality for Accurate Backdoor Input Detection T Xie, X Qi, P He, Y Li, JT Wang, P Mittal International Conference on Learning Representations (ICLR), 2024, 2023	5	2023
Uncovering Adversarial Risks of Test-Time Adaptation T Wu, F Jia, X Qi, JT Wang, V Sehwag, S Mahloujifar, P Mittal International Conference on Machine Learning (ICML), 2023, 2023	5	2023
Safety Alignment Should Be Made More Than Just a Few Tokens Deep X Qi, A Panda, K Lyu, X Ma, S Roy, A Beirami, P Mittal, P Henderson arXiv preprint arXiv:2406.05946, 2024	2	2024
Defensive prompt patch: A robust and interpretable defense of llms against jailbreak attacks C Xiong, X Qi, PY Chen, TY Ho arXiv preprint arXiv:2405.20099, 2024	2	2024
Lottery Ticket Adaptation: Mitigating Destructive Interference in LLMs A Panda, B Isik, X Qi, S Koyejo, T Weissman, P Mittal arXiv preprint arXiv:2406.16797, 2024		2024
SORRY-Bench: Systematically Evaluating Large Language Model Safety Refusal Behaviors T Xie, X Qi, Y Zeng, Y Huang, UM Sehwag, K Huang, L He, B Wei, D Li, ... arXiv preprint arXiv:2406.14598, 2024		2024
AI Risk Management Should Incorporate Both Safety and Security X Qi, Y Huang, Y Zeng, E Debenedetti, J Geiping, L He, K Huang, ... arXiv preprint arXiv:2405.19524, 2024		2024

The system can't perform the operation now. Try again later.

Articles 1–16

Citations per year

Duplicate citations

Merged citations

Add co-authorsCo-authors

Follow

Cited by

Co-authors