Follow
Max Kaufmann
Max Kaufmann
UK AI Safety Institute
Verified email at dsit.gov.uk
Title
Cited by
Cited by
Year
Testing robustness against unforeseen adversaries
M Kaufmann, D Kang, Y Sun, D Hendrycks, T Brown, J Steinhardt
1452019
The reversal curse: Llms trained on" a is b" fail to learn" b is a"
L Berglund, M Tong, M Kaufmann, M Balesni, AC Stickland, T Korbak, ...
arXiv preprint arXiv:2309.12288, 2023
125*2023
Taken out of context: On measuring situational awareness in LLMs
L Berglund, AC Stickland, M Balesni, M Kaufmann, M Tong, T Korbak, ...
arXiv preprint arXiv:2309.00667, 2023
29*2023
Visibility into AI Agents
A Chan, C Ezell, M Kaufmann, K Wei, L Hammond, H Bradley, E Bluemke, ...
The 2024 ACM Conference on Fairness, Accountability, and Transparency, 958-973, 2024
62024
Efficient adversarial training with data pruning
M Kaufmann, Y Zhao, I Shumailov, R Mullins, N Papernot
arXiv preprint arXiv:2207.00694, 2022
62022
The system can't perform the operation now. Try again later.
Articles 1–5