Folgen
Georg Lange
Titel
Zitiert von
Zitiert von
Jahr
Is this the subspace you are looking for? An interpretability illusion for subspace activation patching
A Makelov, G Lange, A Geiger, N Nanda
The Twelfth International Conference on Learning Representations, 2023
32023
An interpretability illusion for activation patching of arbitrary subspaces
G Lange, A Makelov, N Nanda
LessWrong, 2023
32023
Quantifying Psychostimulant-induced Sensitization Effects on Dopamine and Acetylcholine Release across different Timescales
G Lange
2023
Reproducibility report for" Interpretable Complex-Valued Neural Networks for Privacy Protection"
A Sheverdin, N Corten, A Knijff, G Lange
ML Reproducibility Challenge 2020, 2021
2021
Towards Principled Evaluations of Sparse Autoencoders for Interpretability and Control
A Makelov, G Lange, N Nanda
ICLR 2024 Workshop on Secure and Trustworthy Large Language Models, 0
Das System kann den Vorgang jetzt nicht ausführen. Versuchen Sie es später erneut.
Artikel 1–5