Wild-time: A benchmark of in-the-wild distribution shift over time H Yao, C Choi, B Cao, Y Lee, PWW Koh, C Finn Advances in Neural Information Processing Systems 35, 10309-10324, 2022 | 60 | 2022 |
Defending against alignment-breaking attacks via robustly aligned llm B Cao, Y Cao, L Lin, J Chen The 62nd Annual Meeting of the Association for Computational Linguistics …, 2023 | 49 | 2023 |
Omnilytics: A blockchain-based secure data market for decentralized machine learning J Liang, S Li, B Cao, W Jiang, C He arXiv preprint arXiv:2107.05252, 2021 | 14 | 2021 |
On the Safety of Open-Sourced Large Language Models: Does Alignment Really Prevent Them From Being Misused? H Zhang, Z Guo, H Zhu, B Cao, L Lin, J Jia, J Chen, D Wu arXiv preprint arXiv:2310.01581, 2023 | 11 | 2023 |
Stealthy and persistent unalignment on large language models via backdoor injections Y Cao, B Cao, J Chen arXiv preprint arXiv:2312.00027, 2023 | 8 | 2023 |
IMPRESS: Evaluating the Resilience of Imperceptible Perturbations Against Unauthorized Data Usage in Diffusion-Based Generative AI B Cao, C Li, T Wang, J Jia, B Li, J Chen the 37th Conference on Neural Information Processing Systems (NeurIPS), 2023 | 4 | 2023 |
WordGame: Efficient & Effective LLM Jailbreak via Simultaneous Obfuscation in Query and Response T Zhang, B Cao, Y Cao, L Lin, P Mitra, J Chen arXiv preprint arXiv:2405.14023, 2024 | 1 | 2024 |
Watch the Watcher! Backdoor Attacks on Security-Enhancing Diffusion Models C Li, R Pang, B Cao, J Chen, F Ma, S Ji, T Wang arXiv preprint arXiv:2406.09669, 2024 | | 2024 |
On the Intrinsic Self-Correction Capability of LLMs: Uncertainty and Latent Concept G Liu, H Mao, B Cao, Z Xue, K Johnson, J Tang, R Wang arXiv preprint arXiv:2406.02378, 2024 | | 2024 |
XPrompt: Explaining Large Language Model's Generation via Joint Prompt Attribution Y Chang, B Cao, Y Wang, J Chen, L Lin arXiv preprint arXiv:2405.20404, 2024 | | 2024 |
Personalized Steering of Large Language Models: Versatile Steering Vectors Through Bi-directional Preference Optimization Y Cao, T Zhang, B Cao, Z Yin, L Lin, F Ma, J Chen arXiv preprint arXiv:2406.00045, 2024 | | 2024 |
On the Difficulty of Defending Contrastive Learning against Backdoor Attacks C Li, R Pang, B Cao, Z Xi, J Chen, S Ji, T Wang the 33rd USENIX Security Symposium (USENIX Security ‘24), 2023 | | 2023 |
Backdoor Attack for Federated Learning with Fake Clients P Fang, B Cao, J Jia, J Chen | | |
A Change of Heart: Backdoor Attacks on Security-Centric Diffusion Models C Li, R Pang, B Cao, J Chen, T Wang | | |
Wild-Time: A Benchmark of in-the-Wild Distribution Shift over Time–Appendix H Yao, C Choi, B Cao, Y Lee, PW Koh, C Finn | | |