Follow
Peter Hase
Title
Cited by
Cited by
Year
Evaluating explainable AI: Which algorithmic explanations help users predict model behavior?
P Hase, M Bansal
arXiv preprint arXiv:2005.01831, 2020
2492020
Open problems and fundamental limitations of reinforcement learning from human feedback
S Casper, X Davies, C Shi, TK Gilbert, J Scheurer, J Rando, R Freedman, ...
arXiv preprint arXiv:2307.15217, 2023
1372023
Interpretable image recognition with hierarchical prototypes
P Hase, C Chen, O Li, C Rudin
Proceedings of the AAAI Conference on Human Computation and Crowdsourcing 7 …, 2019
1022019
Grips: Gradient-free, edit-based instruction search for prompting large language models
A Prasad, P Hase, X Zhou, M Bansal
arXiv preprint arXiv:2203.07281, 2022
882022
Do language models have beliefs? methods for detecting, updating, and visualizing model beliefs
P Hase, M Diab, A Celikyilmaz, X Li, Z Kozareva, V Stoyanov, M Bansal, ...
arXiv preprint arXiv:2111.13654, 2021
77*2021
Leakage-adjusted simulatability: Can models generate non-trivial explanations of their behavior in natural language?
P Hase, S Zhang, H Xie, M Bansal
arXiv preprint arXiv:2010.04119, 2020
742020
Fastif: Scalable influence functions for efficient model interpretation and debugging
H Guo, NF Rajani, P Hase, M Bansal, C Xiong
arXiv preprint arXiv:2012.15781, 2020
702020
When can models learn from explanations? a formal framework for understanding the roles of explanation data
P Hase, M Bansal
arXiv preprint arXiv:2102.02201, 2021
612021
The Out-of-Distribution Problem in Explainability and Search Methods for Feature Importance Explanations
P Hase, H Xie, M Bansal
Advances in Neural Information Processing Systems 34, 2021
602021
Does localization inform editing? surprising differences in causality-based localization vs. knowledge editing in language models
P Hase, M Bansal, B Kim, A Ghandeharioun
Advances in Neural Information Processing Systems 36, 2024
452024
Summarization programs: Interpretable abstractive summarization with neural modular trees
S Saha, S Zhang, P Hase, M Bansal
arXiv preprint arXiv:2209.10492, 2022
142022
Can Language Models Teach? Teacher Explanations Improve Student Performance via Personalization
S Saha, P Hase, M Bansal
Advances in Neural Information Processing Systems 36, 2024
12*2024
Low-cost algorithmic recourse for users with uncertain cost functions
P Yadav, P Hase, M Bansal
arXiv preprint arXiv:2111.01235, 2021
122021
Visfis: Visual feature importance supervision with right-for-the-right-reason objectives
Z Ying, P Hase, M Bansal
Advances in Neural Information Processing Systems 35, 17057-17072, 2022
92022
Can sensitive information be deleted from llms? objectives for defending against extraction attacks
V Patil, P Hase, M Bansal
arXiv preprint arXiv:2309.17410, 2023
72023
Are hard examples also harder to explain? a study with human and model-generated explanations
S Saha, P Hase, N Rajani, M Bansal
arXiv preprint arXiv:2211.07517, 2022
62022
Rethinking Machine Unlearning for Large Language Models
S Liu, Y Yao, J Jia, S Casper, N Baracaldo, P Hase, X Xu, Y Yao, H Li, ...
arXiv preprint arXiv:2402.08787, 2024
52024
Shall i compare thee to a machine-written sonnet? an approach to algorithmic sonnet generation
J Benhardt, P Hase, L Zhu, C Rudin
arXiv preprint arXiv:1811.05067, 2018
52018
The unreasonable effectiveness of easy training data for hard tasks
P Hase, M Bansal, P Clark, S Wiegreffe
arXiv preprint arXiv:2401.06751, 2024
12024
Adaptive Contextual Perception: How to Generalize to New Backgrounds and Ambiguous Objects
Z Ying, P Hase, M Bansal
Advances in Neural Information Processing Systems 36, 2024
2024
The system can't perform the operation now. Try again later.
Articles 1–20