Max Kaufmann

140

105

2019202020212022202320244 30 33 32 82 130

Meg TongAnthropicVerified email at anthropic.com
Dan HendrycksDirector of the Center for AI SafetyVerified email at berkeley.edu
Daniel KangUIUCVerified email at illinois.edu
Owain EvansResearch Associate, University of OxfordVerified email at philosophy.ox.ac.uk
Tomasz KorbakAnthropicVerified email at anthropic.com
Ilia ShumailovGoogle DeepMindVerified email at google.com
Nicolas PapernotUniversity of Toronto and Vector InstituteVerified email at utoronto.ca
Robert MullinsDepartment of Computer Science and Technology, University of CambridgeVerified email at cl.cam.ac.uk
Yiren (Aaron) ZhaoImperial College London, University of CambridgeVerified email at imperial.ac.uk

Max Kaufmann

UK AI Safety Institute

Verified email at dsit.gov.uk


Title Sort by citations Sort by year Sort by title	Cited by Cited by	Year
Testing robustness against unforeseen adversaries M Kaufmann, D Kang, Y Sun, D Hendrycks, T Brown, J Steinhardt	145	2019
The reversal curse: Llms trained on" a is b" fail to learn" b is a" L Berglund, M Tong, M Kaufmann, M Balesni, AC Stickland, T Korbak, ... arXiv preprint arXiv:2309.12288, 2023	125*	2023
Taken out of context: On measuring situational awareness in LLMs L Berglund, AC Stickland, M Balesni, M Kaufmann, M Tong, T Korbak, ... arXiv preprint arXiv:2309.00667, 2023	29*	2023
Visibility into AI Agents A Chan, C Ezell, M Kaufmann, K Wei, L Hammond, H Bradley, E Bluemke, ... The 2024 ACM Conference on Fairness, Accountability, and Transparency, 958-973, 2024	6	2024
Efficient adversarial training with data pruning M Kaufmann, Y Zhao, I Shumailov, R Mullins, N Papernot arXiv preprint arXiv:2207.00694, 2022	6	2022

The system can't perform the operation now. Try again later.

Articles 1–5

Citations per year