Follow
Clement Neo
Clement Neo
Verified email at e.ntu.edu.sg
Title
Cited by
Cited by
Year
Increasing Trust in Language Models through the Reuse of Verified Circuits
P Quirke, C Neo, F Barez
arXiv preprint arXiv:2402.02619, 2024
22024
Interpreting Context Look-ups in Transformers: Investigating Attention-MLP Interactions
C Neo, SB Cohen, F Barez
arXiv preprint arXiv:2402.15055, 2024
12024
Interpreting Reward Models in RLHF-Tuned Language Models Using Sparse Autoencoders
L Marks, A Abdullah, L Mendez, R Arike, P Torr, F Barez
arXiv preprint arXiv:2310.08164, 2023
12023
The system can't perform the operation now. Try again later.
Articles 1–3