‪Tom Lieberum‬ - ‪Google Scholar‬

Get my own profile

Cited by

	All	Since 2019
Citations	183	183
h-index	3	3
i10-index	2	2

0

120

60

30

90

2022202320243 106 74

Co-authors

Erik JennerUC BerkeleyVerified email at berkeley.edu

Tom Lieberum

Tom Lieberum

Google DeepMind

Verified email at deepmind.com

deep learning large language models interpretability


Title Sort by citations Sort by year Sort by title	Cited by Cited by	Year
Progress measures for grokking via mechanistic interpretability N Nanda, L Chan, T Lieberum, J Smith, J Steinhardt arXiv preprint arXiv:2301.05217, 2023	146	2023
Does circuit analysis interpretability scale? evidence from multiple choice capabilities in chinchilla T Lieberum, M Rahtz, J Kramár, G Irving, R Shah, V Mikulik arXiv preprint arXiv:2307.09458, 2023	26	2023
Retrospective on the 2021 minerl BASALT competition on learning from human feedback R Shah, SH Wang, C Wild, S Milani, A Kanervisto, VG Goecks, ... NeurIPS 2021 Competitions and Demonstrations Track, 259-272, 2022	7	2022
AtP*: An efficient and scalable method for localizing LLM behaviour to components J Kramár, T Lieberum, R Shah, N Nanda arXiv preprint arXiv:2403.00745, 2024	3	2024
Retrospective on the 2021 BASALT Competition on Learning from Human Feedback R Shah, SH Wang, C Wild, S Milani, A Kanervisto, VG Goecks, ... arXiv preprint arXiv:2204.07123, 2022	1	2022
Replication: Fairness without demographics through Adversarially Reweighted Learning E Jenner, T Lieberum, FP Nolte, N Rutsch		2021

The system can't perform the operation now. Try again later.

Articles 1–6