Follow
Fabien Roger
Fabien Roger
Redwood Research
Verified email at rdwrs.com
Title
Cited by
Cited by
Year
Ai control: Improving safety despite intentional subversion
R Greenblatt, B Shlegeris, K Sachan, F Roger
arXiv preprint arXiv:2312.06942, 2023
52023
Language models are better than humans at next-token prediction
B Shlegeris, F Roger, L Chan, E McLean
arXiv preprint arXiv:2212.11281, 2022
42022
Preventing language models from hiding their reasoning
F Roger, R Greenblatt
arXiv preprint arXiv:2310.18512, 2023
32023
Measurement tampering detection benchmark
F Roger, R Greenblatt, M Nadeau, B Shlegeris, N Thomas
arXiv preprint arXiv:2308.15605, 2023
22023
Large Language Models Sometimes Generate Purely Negatively-Reinforced Text
F Roger
arXiv preprint arXiv:2306.07567, 2023
2023
The system can't perform the operation now. Try again later.
Articles 1–5