Arthur Conmy

Cited by

	All	Since 2019
Citations	452	452
h-index	8	8
i10-index	7	7

320

160

240

2022202320245 127 317

Co-authors

Alexandre VariengienENS de Lyon & EPFLVerified email at ens-lyon.fr
Neel NandaResearch Engineer, Google DeepMindVerified email at deepmind.com
Jacob SteinhardtStanford UniversityVerified email at cs.stanford.edu
Adrià Garriga-AlonsoResearch Scientist, FAR AIVerified email at far.ai
Stefan HeimersheimInstitute of Astronomy, University of CambridgeVerified email at cam.ac.uk
Aengus LynchUniversity College London, MATSVerified email at ucl.ac.uk
Janos KramarDeepMindVerified email at google.com
Senthooran RajamanoharanGoogle DeepMindVerified email at google.com
Nicholas CarliniGoogle DeepMindVerified email at google.com
Daniel PalekaETH ZurichVerified email at inf.ethz.ch
Lewis SmithPhD Student, University of OxfordVerified email at kellogg.ox.ac.uk
Rohin ShahResearch Scientist, Google DeepMindVerified email at deepmind.com
Can RagerIndependentVerified email at northeastern.edu
Aaquib SyedStudent, University of MarylandVerified email at umd.edu
Rhys GouldMathematics Undergraduate, University of CambridgeVerified email at cam.ac.uk
Euan OngResearch Assistant, University of CambridgeVerified email at cam.ac.uk
Vikrant VarmaDeepMindVerified email at deepmind.com
Rowan WangVerified email at rdwrs.com
Tom LieberumGoogle DeepMindVerified email at deepmind.com
Itay YonaGoogle DeepMindVerified email at google.com

Arthur Conmy

Google DeepMind

Verified email at google.com - Homepage

Mechanistic Interpretability AI Safety


Title Sort by citations Sort by year Sort by title	Cited by Cited by	Year
Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 Small K Wang, A Variengien, A Conmy, B Shlegeris, J Steinhardt ICLR 2023, 2022	244	2022
Towards Automated Circuit Discovery for Mechanistic Interpretability A Conmy, AN Mavor-Parker, A Lynch, S Heimersheim, A Garriga-Alonso NeurIPS 2023 Spotlight, 2023	118	2023
Stealing Part of a Production Language Model N Carlini, D Paleka, KD Dvijotham, T Steinke, J Hayase, AF Cooper, ... ICML 2024 Best Paper, 2024	21	2024
Attribution Patching Outperforms Automated Circuit Discovery A Syed, C Rager, A Conmy NeurIPS 2023 Workshop (Attributing Model Behavior at Scale), 2023	18	2023
Copy Suppression: Comprehensively Understanding an Attention Head C McDougall, A Conmy, C Rushing, T McGrath, N Nanda NeurIPS 2023 Workshop (Attributing Model Behavior at Scale), 2023	16	2023
Successor Heads: Recurring, Interpretable Attention Heads In The Wild R Gould, E Ong, G Ogden, A Conmy ICLR 2024, 2023	11	2023
Improving Dictionary Learning with Gated Sparse Autoencoders S Rajamanoharan, A Conmy, L Smith, T Lieberum, V Varma, J Kramár, ... ICML 2024 Mechanistic Interpretability Workshop, 2024	10*	2024
Interpreting Attention Layer Outputs with Sparse Autoencoders C Kissane, R Krzyzanowski, JI Bloom, A Conmy, N Nanda ICML 2024 Mechanistic Interpretability Workshop Spotlight, 2024	9*	2024
StyleGAN-induced Data-Driven Regularization for Inverse Problems A Conmy, S Mukherjee, CB Schönlieb IEEE ICASSP 2022, 2022	5	2022
Jumping Ahead: Improving Reconstruction Fidelity with JumpReLU Sparse Autoencoders S Rajamanoharan, T Lieberum, N Sonnerat, A Conmy, V Varma, J Kramár, ... arXiv preprint arXiv:2407.14435, 2024		2024
Activation Steering with SAEs A Conmy, N Nanda www.alignmentforum.org/posts/C5KAZQib3bzzpeyrg, 2024		2024

The system can't perform the operation now. Try again later.

Articles 1–11

Citations per year

Duplicate citations

Merged citations

Add co-authorsCo-authors

Follow

Cited by

Co-authors