Follow
Max Kaufmann
Max Kaufmann
UK AI Safety Institute
Verified email at dsit.gov.uk
Title
Cited by
Cited by
Year
Testing robustness against unforeseen adversaries
M Kaufmann, D Kang, Y Sun, D Hendrycks, T Brown, J Steinhardt
1342019
The Reversal Curse: LLMs trained on" A is B" fail to learn" B is A"
L Berglund, M Tong, M Kaufmann, M Balesni, AC Stickland, T Korbak, ...
ATTRIB Workshop, NeurIPS 2023, 2023
262023
Efficient adversarial training with data pruning
M Kaufmann, Y Zhao, I Shumailov, R Mullins, N Papernot
arXiv preprint arXiv:2207.00694, 2022
42022
Taken out of context: On measuring situational awareness in LLMs
L Berglund, AC Stickland, M Balesni, M Kaufmann, M Tong, T Korbak, ...
arXiv preprint arXiv:2309.00667, 2023
32023
Visibility into AI Agents
A Chan, C Ezell, M Kaufmann, K Wei, L Hammond, H Bradley, E Bluemke, ...
arXiv preprint arXiv:2401.13138, 2024
12024
The system can't perform the operation now. Try again later.
Articles 1–5