Follow
Xander Davies
Xander Davies
Verified email at college.harvard.edu - Homepage
Title
Cited by
Cited by
Year
Open problems and fundamental limitations of reinforcement learning from human feedback
S Casper*, X Davies*, C Shi, TK Gilbert, J Scheurer, J Rando, ...
arXiv preprint arXiv:2307.15217, 2023
1562023
Unifying Grokking and Double Descent
X Davies*, L Langosco*, D Krueger
arXiv preprint arXiv:2303.06173, 2023
142023
Sparse distributed memory is a continual learner
T Bricken, X Davies, D Singh, D Krotov, G Kreiman
arXiv preprint arXiv:2303.11934, 2023
92023
Circuit Breaking: Removing Model Behaviors with Targeted Ablation
M Li*, X Davies*, M Nadeau*
72023
Discovering Variable Binding Circuitry with Desiderata
X Davies*, M Nadeau*, N Prakash*, TR Shaham, D Bau
arXiv preprint arXiv:2307.03637, 2023
42023
Delayed Generalization: Bridging Double Descent and Grokking
X Davies, J Hoogland, L Langosco, D Krueger
2023
The system can't perform the operation now. Try again later.
Articles 1–6