Max Nadeau

120

2022202320245 88 101

Xander DaviesHarvard UniversityVerified email at college.harvard.edu
Stephen CasperPhD student, MITVerified email at mit.edu
Dylan Hadfield-MenellMassachusetts Institute of TechnologyVerified email at csail.mit.edu
David BauAssistant Professor at Northeastern UniversityVerified email at northeastern.edu
Nikhil PrakashNortheastern UniversityVerified email at northeastern.edu
Buck ShlegerisCTO, Redwood ResearchVerified email at rdwrs.com
Fabien RogerRedwood ResearchVerified email at rdwrs.com

Max Nadeau

Verified email at college.harvard.edu - Homepage


Title Sort by citations Sort by year Sort by title	Cited by Cited by	Year
Open problems and fundamental limitations of reinforcement learning from human feedback S Casper, X Davies, C Shi, TK Gilbert, J Scheurer, J Rando, R Freedman, ... arXiv preprint arXiv:2307.15217, 2023	158	2023
Robust feature-level adversaries are interpretability tools S Casper, M Nadeau, D Hadfield-Menell, G Kreiman Advances in Neural Information Processing Systems 35, 33093-33106, 2022	23	2022
Circuit breaking: Removing model behaviors with targeted ablation M Li, X Davies, M Nadeau arXiv preprint arXiv:2309.05973, 2023	7	2023
Discovering variable binding circuitry with desiderata X Davies, M Nadeau, N Prakash, TR Shaham, D Bau arXiv preprint arXiv:2307.03637, 2023	4	2023
Measurement tampering detection benchmark F Roger, R Greenblatt, M Nadeau, B Shlegeris, N Thomas arXiv preprint arXiv:2308.15605, 2023	2	2023

The system can't perform the operation now. Try again later.

Articles 1–5

Citations per year