Follow
Gabriel Mukobi
Gabriel Mukobi
PhD Student, UC Berkeley; Fellow, RAND
Verified email at cs.stanford.edu - Homepage
Title
Cited by
Cited by
Year
The wmdp benchmark: Measuring and reducing malicious use with unlearning
N Li, A Pan, A Gopal, S Yue, D Berrios, A Gatti, JD Li, AK Dombrowski, ...
arXiv preprint arXiv:2403.03218, 2024
242024
Escalation risks from language models in military and diplomatic decision-making
JP Rivera, G Mukobi, A Reuel, M Lamparth, C Smith, J Schneider
The 2024 ACM Conference on Fairness, Accountability, and Transparency, 836-898, 2024
142024
Welfare diplomacy: Benchmarking language model cooperation
G Mukobi, H Erlebach, N Lauffer, L Hammond, A Chan, J Clifton
arXiv preprint arXiv:2310.08901, 2023
122023
Societal Adaptation to Advanced AI
J Bernardi, G Mukobi, H Greaves, L Heim, M Anderljung
arXiv preprint arXiv:2405.10295, 2024
32024
Why Has Predicting Downstream Capabilities of Frontier AI Models with Scale Remained Elusive?
R Schaeffer, H Schoelkopf, B Miranda, G Mukobi, V Madan, A Ibrahim, ...
arXiv preprint arXiv:2406.04391, 2024
22024
Assessing Risks of Using Autonomous Language Models in Military and Diplomatic Planning
G Mukobi, AK Reuel, JP Rivera, C Smith
Multi-Agent Security Workshop@ NeurIPS'23, 2023
12023
SuperHF: Supervised Iterative Learning from Human Feedback
G Mukobi, P Chatain, S Fong, R Windesheim, G Kutyniok, K Bhatia, ...
arXiv preprint arXiv:2310.16763, 2023
12023
Open Problems in Technical AI Governance
A Reuel, B Bucknall, S Casper, T Fist, L Soder, O Aarne, L Hammond, ...
arXiv preprint arXiv:2407.14981, 2024
2024
Opportunities in Physics Education: Low-Cost Position Tracking for Use in Kinematics Labs
PR DeStefano, C Siebert, R Perez-Franco, T Allen, G Mukobi, ...
2018
The system can't perform the operation now. Try again later.
Articles 1–9