Gabriel Mukobi

Cited by

	All	Since 2019
Citations	57	57
h-index	3	3
i10-index	3	3

202320241 56

Co-authors

Juan-Pablo RiveraGeorgia Institute of TechnologyVerified email at gatech.edu
Anka ReuelCS Ph.D. Student, Stanford UniversityVerified email at cs.stanford.edu
Max LamparthPostdoctoral Fellow, Stanford UniversityVerified email at stanford.edu
Jacquelyn SchneiderStanford UniversityVerified email at stanford.edu
Niklas LaufferPh.D. Student at UC BerkeleyVerified email at berkeley.edu
Lewis HammondUniversity of OxfordVerified email at cs.ox.ac.uk
Alan ChanCentre for the Governance of AI; Mila, Université de MontréalVerified email at mila.quebec
Hilary GreavesAssociate Professor in Philosophy, University of OxfordVerified email at philosophy.ox.ac.uk
Markus AnderljungCentre for the Governance of AIVerified email at governance.ai
James BernardiVerified email at jamiebernardi.com
Lennart HeimCentre for the Governance of AIVerified email at governance.ai
Brando MirandaMIT, UIUC, StanfordVerified email at stanford.edu
Rylan SchaefferStanford UniversityVerified email at stanford.edu
Hailey SchoelkopfResearcher, EleutherAIVerified email at eleuther.ai
Peter ChatainStanford UniversityVerified email at stanford.edu
Gitta KutyniokBavarian AI Chair for Mathematical Foundations of Artificial Intelligence, LMU MunichVerified email at math.lmu.de
Chandler SmithNortheastern UniversityVerified email at northeastern.edu
Kush BhatiaStanford UniversityVerified email at berkeley.edu

Gabriel Mukobi

PhD Student, UC Berkeley; Fellow, RAND

Verified email at cs.stanford.edu - Homepage

AI safety evaluations robustness auditing interpretability.


Title Sort by citations Sort by year Sort by title	Cited by Cited by	Year
The wmdp benchmark: Measuring and reducing malicious use with unlearning N Li, A Pan, A Gopal, S Yue, D Berrios, A Gatti, JD Li, AK Dombrowski, ... arXiv preprint arXiv:2403.03218, 2024	24	2024
Escalation risks from language models in military and diplomatic decision-making JP Rivera, G Mukobi, A Reuel, M Lamparth, C Smith, J Schneider The 2024 ACM Conference on Fairness, Accountability, and Transparency, 836-898, 2024	14	2024
Welfare diplomacy: Benchmarking language model cooperation G Mukobi, H Erlebach, N Lauffer, L Hammond, A Chan, J Clifton arXiv preprint arXiv:2310.08901, 2023	12	2023
Societal Adaptation to Advanced AI J Bernardi, G Mukobi, H Greaves, L Heim, M Anderljung arXiv preprint arXiv:2405.10295, 2024	3	2024
Why Has Predicting Downstream Capabilities of Frontier AI Models with Scale Remained Elusive? R Schaeffer, H Schoelkopf, B Miranda, G Mukobi, V Madan, A Ibrahim, ... arXiv preprint arXiv:2406.04391, 2024	2	2024
Assessing Risks of Using Autonomous Language Models in Military and Diplomatic Planning G Mukobi, AK Reuel, JP Rivera, C Smith Multi-Agent Security Workshop@ NeurIPS'23, 2023	1	2023
SuperHF: Supervised Iterative Learning from Human Feedback G Mukobi, P Chatain, S Fong, R Windesheim, G Kutyniok, K Bhatia, ... arXiv preprint arXiv:2310.16763, 2023	1	2023
Open Problems in Technical AI Governance A Reuel, B Bucknall, S Casper, T Fist, L Soder, O Aarne, L Hammond, ... arXiv preprint arXiv:2407.14981, 2024		2024
Opportunities in Physics Education: Low-Cost Position Tracking for Use in Kinematics Labs PR DeStefano, C Siebert, R Perez-Franco, T Allen, G Mukobi, ...		2018

The system can't perform the operation now. Try again later.

Articles 1–9

Citations per year

Duplicate citations

Merged citations

Add co-authorsCo-authors

Follow

Cited by

Co-authors