Follow
Tinghao Xie
Title
Cited by
Cited by
Year
Fine-tuning aligned language models compromises safety, even when users do not intend to!
X Qi, Y Zeng, T Xie, PY Chen, R Jia, P Mittal, P Henderson
ICLR 2024 (Oral), 2023
1892023
Revisiting the assumption of latent separability for backdoor defenses
X Qi, T Xie, Y Li, S Mahloujifar, P Mittal
ICLR 2023, 2022
81*2022
Towards practical deployment-stage backdoor attack on deep neural networks
X Qi, T Xie, R Pan, J Zhu, Y Yang, K Bu
CVPR 2022 (Oral), 13347-13357, 2022
542022
Towards a proactive {ML} approach for detecting backdoor poison samples
X Qi, T Xie, JT Wang, T Wu, S Mahloujifar, P Mittal
32nd USENIX Security Symposium (USENIX Security 23), 1685-1702, 2023
29*2023
Assessing the brittleness of safety alignment via pruning and low-rank modifications
B Wei, K Huang, Y Huang, T Xie, X Qi, M Xia, P Mittal, M Wang, ...
ICML 2024, 2024
272024
BaDExpert: Extracting Backdoor Functionality for Accurate Backdoor Input Detection
T Xie, X Qi, P He, Y Li, JT Wang, P Mittal
ICLR 2024, 2023
52023
Fantastic Copyrighted Beasts and How (Not) to Generate Them
L He, Y Huang, W Shi, T Xie, H Liu, Y Wang, L Zettlemoyer, C Zhang, ...
arXiv preprint arXiv:2406.14526, 2024
12024
SORRY-Bench: Systematically Evaluating Large Language Model Safety Refusal Behaviors
T Xie, X Qi, Y Zeng, Y Huang, UM Sehwag, K Huang, L He, B Wei, D Li, ...
arXiv preprint arXiv:2406.14598, 2024
2024
AI Risk Management Should Incorporate Both Safety and Security
X Qi, Y Huang, Y Zeng, E Debenedetti, J Geiping, L He, K Huang, ...
arXiv preprint arXiv:2405.19524, 2024
2024
A Handbook for Deep Learning with their Piecemeal Intuitions from Causal Theory
T Xie
2021
Ensemble of Narrow DNN Chains
T Xie
2021
Texture Packing
T Xie, H Lin, Z Zhao
2020
The system can't perform the operation now. Try again later.
Articles 1–12