| Gaussian Error Linear Units (GELUs) D Hendrycks, K Gimpel arXiv preprint arXiv:1606.08415, 2016 | 9093 | 2016 |
| Measuring Massive Multitask Language Understanding D Hendrycks, C Burns, S Basart, A Zou, M Mazeika, D Song, J Steinhardt International Conference on Learning Representations (ICLR), 2020 | 5914 | 2020 |
| Benchmarking Neural Network Robustness to Common Corruptions and Perturbations D Hendrycks, T Dietterich International Conference on Learning Representations (ICLR), 2019 | 4933 | 2019 |
| A Baseline for Detecting Misclassified and Out-of-Distribution Examples in Neural Networks D Hendrycks, K Gimpel International Conference on Learning Representations (ICLR), 2017 | 4808 | 2017 |
| Measuring Mathematical Problem Solving With the MATH Dataset D Hendrycks, C Burns, S Kadavath, A Arora, S Basart, E Tang, D Song, ... NeurIPS, 2021 | 3143 | 2021 |
| The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Generalization D Hendrycks, S Basart, N Mu, S Kadavath, F Wang, E Dorundo, R Desai, ... International Conference on Computer Vision (ICCV), 2020 | 2398 | 2020 |
| Deep Anomaly Detection with Outlier Exposure D Hendrycks, M Mazeika, T Dietterich International Conference on Learning Representations (ICLR), 2019 | 2103 | 2019 |
| Natural Adversarial Examples D Hendrycks, K Zhao, S Basart, J Steinhardt, D Song Conference on Computer Vision and Pattern Recognition (CVPR), 2019 | 2085 | 2019 |
| Beyond the imitation game: Quantifying and extrapolating the capabilities of language models A Srivastava, A Rastogi, A Rao, AAM Shoeb, A Abid, A Fisch, AR Brown, ... Transactions on machine learning research, 2023 | 2011 | 2023 |
| AugMix: A Simple Data Processing Method to Improve Robustness and Uncertainty D Hendrycks, N Mu, ED Cubuk, B Zoph, J Gilmer, B Lakshminarayanan International Conference on Learning Representations (ICLR), 2020 | 1991* | 2020 |
| Using Self-Supervised Learning Can Improve Model Robustness and Uncertainty D Hendrycks, M Mazeika, S Kadavath, D Song Neural Information Processing Systems (NeurIPS), 2019 | 1267 | 2019 |
| Using Pre-training Can Improve Model Robustness and Uncertainty D Hendrycks, K Lee, M Mazeika International Conference on Machine Learning, 2712-2721, 2019 | 1026 | 2019 |
| Measuring Coding Challenge Competence With APPS D Hendrycks, S Basart, S Kadavath, M Mazeika, A Arora, E Guo, C Burns, ... NeurIPS, 2021 | 864 | 2021 |
| Aligning AI With Shared Human Values D Hendrycks, C Burns, S Basart, A Critch, J Li, D Song, J Steinhardt International Conference on Learning Representations (ICLR), 2020 | 784 | 2020 |
| Scaling Out-of-Distribution Detection for Real-World Settings D Hendrycks, S Basart, M Mazeika, M Mostajabi, J Steinhardt, D Song International Conference on Machine Learning (ICML), 2022 | 755* | 2022 |
| Using Trusted Data to Train Deep Networks on Labels Corrupted by Severe Noise D Hendrycks, M Mazeika, D Wilson, K Gimpel Neural Information Processing Systems (NeurIPS), 2018 | 737 | 2018 |
| DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models. B Wang, W Chen, H Pei, C Xie, M Kang, C Zhang, C Xu, Z Xiong, R Dutta, ... NeurIPS, 2023 | 622 | 2023 |
| Representation engineering: A top-down approach to ai transparency A Zou, L Phan, S Chen, J Campbell, P Guo, R Ren, A Pan, X Yin, ... arXiv preprint arXiv:2310.01405, 2023 | 562 | 2023 |
| HarmBench: A standardized evaluation framework for automated red teaming and robust refusal M Mazeika, L Phan, X Yin, A Zou, Z Wang, N Mu, E Sakhaee, N Li, ... ICML 2024, 2024 | 545 | 2024 |
| Pretrained Transformers Improve Out-of-Distribution Robustness D Hendrycks, X Liu, E Wallace, A Dziedzic, R Krishnan, D Song Association for Computational Linguistics (ACL), 2020 | 536 | 2020 |