| Concrete problems in AI safety D Amodei, C Olah, J Steinhardt, P Christiano, J Schulman, D Mané arXiv preprint arXiv:1606.06565, 2016 | 2008 | 2016 |
| Training language models to follow instructions with human feedback L Ouyang, J Wu, X Jiang, D Almeida, C Wainwright, P Mishkin, C Zhang, ... Advances in Neural Information Processing Systems 35, 27730-27744, 2022 | 1152 | 2022 |
| Theano: A Python framework for fast computation of mathematical expressions R Al-Rfou, G Alain, A Almahairi, C Angermueller, D Bahdanau, N Ballas, ... arXiv e-prints, arXiv: 1605.02688, 2016 | 1070* | 2016 |
| Deep reinforcement learning from human preferences PF Christiano, J Leike, T Brown, M Martic, S Legg, D Amodei Advances in neural information processing systems 30, 2017 | 1033 | 2017 |
| Electrical flows, laplacian systems, and faster approximation of maximum flow in undirected graphs P Christiano, JA Kelner, A Madry, DA Spielman, SH Teng Proceedings of the forty-third annual ACM symposium on Theory of computing …, 2011 | 379 | 2011 |
| A connection between generative adversarial networks, inverse reinforcement learning, and energy-based models C Finn, P Christiano, P Abbeel, S Levine arXiv preprint arXiv:1611.03852, 2016 | 340 | 2016 |
| Learning to summarize with human feedback N Stiennon, L Ouyang, J Wu, D Ziegler, R Lowe, C Voss, A Radford, ... Advances in Neural Information Processing Systems 33, 3008-3021, 2020 | 327 | 2020 |
| Fine-tuning language models from human preferences DM Ziegler, N Stiennon, J Wu, TB Brown, A Radford, D Amodei, ... arXiv preprint arXiv:1909.08593, 2019 | 297 | 2019 |
| Transfer from simulation to real world through learning deep inverse dynamics model P Christiano, Z Shah, I Mordatch, J Schneider, T Blackwell, J Tobin, ... arXiv preprint arXiv:1610.03518, 2016 | 225 | 2016 |
| Quantum money from hidden subspaces S Aaronson, P Christiano Proceedings of the forty-fourth annual ACM symposium on Theory of computing …, 2012 | 161 | 2012 |
| A cryptographic test of quantumness and certifiable randomness from a single quantum device Z Brakerski, P Christiano, U Mahadev, U Vazirani, T Vidick Journal of the ACM (JACM) 68 (5), 1-47, 2021 | 119 | 2021 |
| Recursively summarizing books with human feedback J Wu, L Ouyang, DM Ziegler, N Stiennon, R Lowe, J Leike, P Christiano arXiv preprint arXiv:2109.10862, 2021 | 84 | 2021 |
| AI safety via debate G Irving, P Christiano, D Amodei arXiv preprint arXiv:1805.00899, 2018 | 79 | 2018 |
| Unrestricted adversarial examples TB Brown, N Carlini, C Zhang, C Olsson, P Christiano, I Goodfellow arXiv preprint arXiv:1809.08352, 2018 | 76 | 2018 |
| Supervising strong learners by amplifying weak experts P Christiano, B Shlegeris, D Amodei arXiv preprint arXiv:1810.08575, 2018 | 46 | 2018 |
| Robust Cooperation in the Prisoner's Dilemma: Program Equilibrium via Provability Logic M Barasz, P Christiano, B Fallenstein, M Herreshoff, P LaVictoire, ... arXiv preprint arXiv:1401.5577, 2014 | 43* | 2014 |
| Online local learning via semidefinite programming P Christiano Proceedings of the forty-sixth annual ACM symposium on Theory of computing …, 2014 | 17 | 2014 |
| Non-omniscience, probabilistic inference, and metamathematics P Christiano Machine Intelligence Research Institute, Berkeley, CA, June, 2014 | 14* | 2014 |
| Reflective oracles: A foundation for game theory in artificial intelligence B Fallenstein, J Taylor, PF Christiano Logic, Rationality, and Interaction: 5th International Workshop, LORI 2015 …, 2015 | 11 | 2015 |
| Lossless fault-tolerant data structures with additive overhead P Christiano, ED Demaine, S Kishore Workshop on Algorithms and Data Structures, 243-254, 2011 | 8 | 2011 |