Follow
Xiangyu QI
Xiangyu QI
Verified email at princeton.edu - Homepage
Title
Cited by
Cited by
Year
Fine-tuning aligned language models compromises safety, even when users do not intend to!
X Qi, Y Zeng, T Xie, PY Chen, R Jia, P Mittal, P Henderson
International Conference on Learning Representations (ICLR), 2024 (Oral), 2023
1892023
Visual Adversarial Examples Jailbreak Aligned Large Language Models
X Qi, K Huang, A Panda, P Henderson, M Wang, P Mittal
AAAI Conference on Artificial Intelligence, 2024 (Oral), 2023
106*2023
Revisiting the assumption of latent separability for backdoor defenses
X Qi, T Xie, Y Li, S Mahloujifar, P Mittal
International Conference on Learning Representations (ICLR), 2023, 2023
81*2023
Towards Practical Deployment-Stage Backdoor Attack on Deep Neural Networks
X Qi, T Xie, R Pan, J Zhu, Y Yang, K Bu
Conference on Computer Vision and Pattern Recognition (CVPR), 2022 (Oral), 2021
542021
Knowledge Enhanced Machine Learning Pipeline against Diverse Adversarial Attacks
NM Gürel, X Qi, L Rimanic, C Zhang, B Li
International Conference on Machine Learning (ICML), 2021, 2021
352021
Towards A Proactive {ML} Approach for Detecting Backdoor Poison Samples
X Qi, T Xie, JT Wang, T Wu, S Mahloujifar, P Mittal
32nd USENIX Security Symposium (USENIX Security 23), 1685-1702, 2023
29*2023
Assessing the brittleness of safety alignment via pruning and low-rank modifications
B Wei, K Huang, Y Huang, T Xie, X Qi, M Xia, P Mittal, M Wang, ...
International Conference on Machine Learning (ICML), 2024, 2024
272024
Subnet Replacement: Deployment-stage backdoor attack against deep neural networks in gray-box setting
X Qi, J Zhu, C Xie, Y Yang
ICLR Workshop, 2021
252021
Mitigating fine-tuning jailbreak attack with backdoor enhanced alignment
J Wang, J Li, Y Li, X Qi, M Chen, J Hu, Y Li, B Li, C Xiao
arXiv preprint arXiv:2402.14968, 2024
122024
BaDExpert: Extracting Backdoor Functionality for Accurate Backdoor Input Detection
T Xie, X Qi, P He, Y Li, JT Wang, P Mittal
International Conference on Learning Representations (ICLR), 2024, 2023
52023
Uncovering Adversarial Risks of Test-Time Adaptation
T Wu, F Jia, X Qi, JT Wang, V Sehwag, S Mahloujifar, P Mittal
International Conference on Machine Learning (ICML), 2023, 2023
52023
Safety Alignment Should Be Made More Than Just a Few Tokens Deep
X Qi, A Panda, K Lyu, X Ma, S Roy, A Beirami, P Mittal, P Henderson
arXiv preprint arXiv:2406.05946, 2024
22024
Defensive prompt patch: A robust and interpretable defense of llms against jailbreak attacks
C Xiong, X Qi, PY Chen, TY Ho
arXiv preprint arXiv:2405.20099, 2024
22024
Lottery Ticket Adaptation: Mitigating Destructive Interference in LLMs
A Panda, B Isik, X Qi, S Koyejo, T Weissman, P Mittal
arXiv preprint arXiv:2406.16797, 2024
2024
SORRY-Bench: Systematically Evaluating Large Language Model Safety Refusal Behaviors
T Xie, X Qi, Y Zeng, Y Huang, UM Sehwag, K Huang, L He, B Wei, D Li, ...
arXiv preprint arXiv:2406.14598, 2024
2024
AI Risk Management Should Incorporate Both Safety and Security
X Qi, Y Huang, Y Zeng, E Debenedetti, J Geiping, L He, K Huang, ...
arXiv preprint arXiv:2405.19524, 2024
2024
The system can't perform the operation now. Try again later.
Articles 1–16