Training verifiers to solve math word problems K Cobbe, V Kosaraju, M Bavarian, M Chen, H Jun, L Kaiser, M Plappert, ... arXiv preprint arXiv:2110.14168, 2021 | 1013 | 2021 |
Quantifying generalization in reinforcement learning K Cobbe, O Klimov, C Hesse, T Kim, J Schulman International conference on machine learning, 1282-1289, 2019 | 653 | 2019 |
Webgpt: Browser-assisted question-answering with human feedback R Nakano, J Hilton, S Balaji, J Wu, L Ouyang, C Kim, C Hesse, S Jain, ... arXiv preprint arXiv:2112.09332, 2021 | 651 | 2021 |
Leveraging procedural generation to benchmark reinforcement learning K Cobbe, C Hesse, J Hilton, J Schulman International conference on machine learning, 2048-2056, 2020 | 492 | 2020 |
System and method for activity management presentation Y Shoham, JE Bank, K Cobbe, A Matta, M Rubin, ZI Weiner, KT Toft US Patent App. 14/076,046, 2015 | 212 | 2015 |
Let's Verify Step by Step H Lightman, V Kosaraju, Y Burda, H Edwards, B Baker, T Lee, J Leike, ... arXiv preprint arXiv:2305.20050, 2023 | 178 | 2023 |
Phasic policy gradient KW Cobbe, J Hilton, O Klimov, J Schulman International Conference on Machine Learning, 2020-2027, 2021 | 146 | 2021 |
Training verifiers to solve math word problems, 2021 K Cobbe, V Kosaraju, M Bavarian, M Chen, H Jun, L Kaiser, M Plappert, ... URL https://arxiv. org/abs/2110.14168, 2021 | 37 | 2021 |
Measuring sample efficiency and generalization in reinforcement learning benchmarks: Neurips 2020 procgen benchmark S Mohanty, J Poonganam, A Gaidon, A Kolobov, B Wulfe, D Chakraborty, ... arXiv preprint arXiv:2103.15332, 2021 | 18 | 2021 |
Batch size-invariance for policy optimization J Hilton, K Cobbe, J Schulman Advances in Neural Information Processing Systems 35, 17086-17098, 2022 | 11 | 2022 |
Event scheduling presentation in a graphical user interface environment Y Shoham, JE Bank, K Cobbe, A Matta, M Rubin, ZI Weiner, KT Toft US Patent 10,088,973, 2018 | 1 | 2018 |