Open problems and fundamental limitations of reinforcement learning from human feedback S Casper, X Davies, C Shi, TK Gilbert, J Scheurer, J Rando, R Freedman, ... arXiv preprint arXiv:2307.15217, 2023 | 271 | 2023 |
Training language models with language feedback at scale J Scheurer, JA Campos, T Korbak, JS Chan, A Chen, K Cho, E Perez arXiv preprint arXiv:2303.16755, 2023 | 71 | 2023 |
Training Language Models with Language Feedback J Scheurer, JA Campos, JS Chan, A Chen, K Cho, E Perez arXiv preprint arXiv:2204.14146, 2022 | 58* | 2022 |
Improving code generation by training with natural language feedback A Chen, J Scheurer, T Korbak, JA Campos, JS Chan, SR Bowman, K Cho, ... arXiv preprint arXiv:2303.16749, 2023 | 42 | 2023 |
Black-box access is insufficient for rigorous ai audits S Casper, C Ezell, C Siegmann, N Kolt, TL Curtis, B Bucknall, A Haupt, ... The 2024 ACM Conference on Fairness, Accountability, and Transparency, 2254-2272, 2024 | 19 | 2024 |
Large Language Models can Strategically Deceive their Users when Put Under Pressure J Scheurer, M Balesni, M Hobbhahn ICLR 2024 Workshop on Large Language Model (LLM) Agents, 2024 | 15* | 2024 |
Semantic Segmentation of Histopathological Slides for the Classification of Cutaneous Lymphoma and Eczema J Scheurer, C Ferrari, LBT Bom, M Beer, W Kempf, L Haug Annual Conference on Medical Image Understanding and Analysis, 26-42, 2020 | 15 | 2020 |
A Causal Framework for AI Regulation and Auditing L Sharkey, CN Ghuidhir, D Braun, J Scheurer, M Balesni, L Bushnaq, ... Preprints, 2024 | 9* | 2024 |
Instance-wise algorithm configuration with graph neural networks R Valentin, C Ferrari, J Scheurer, A Amrollahi, C Wendler, MB Paulus arXiv preprint arXiv:2202.04910, 2022 | 5 | 2022 |
Few-shot adaptation works with unpredictable data JS Chan, M Pieler, J Jao, J Scheurer, E Perez arXiv preprint arXiv:2208.01009, 2022 | 3 | 2022 |
Me, Myself, and AI: The Situational Awareness Dataset (SAD) for LLMs R Laine, B Chughtai, J Betley, K Hariharan, J Scheurer, M Balesni, ... arXiv preprint arXiv:2407.04694, 2024 | | 2024 |
Practical Pitfalls of Causal Scrubbing J Scheurer, H Philipp, M Tony, T Jacques, L David https://www.lesswrong.com/posts/DFarDnQjMnjsKvW8s/practical-pitfalls-of …, 2023 | | 2023 |
Meta Reward Learning for Recommender Systems: Towards Value Alignment J Scheurer | | 2021 |
Meta-Learning an Image Editing Style J Scheurer | | 2019 |
TracrBench: Generating Interpretability Test-Beds with Large Language Models H Thurnherr, J Scheurer ICML 2024 Workshop on Mechanistic Interpretability, 0 | | |