Follow
Rusheb Shah
Rusheb Shah
Apollo Research
Verified email at apolloresearch.ai - Homepage
Title
Cited by
Cited by
Year
Scalable and Transferable Black-Box Jailbreaks for Language Models via Persona Modulation
R Shah, S Pour, A Tagade, S Casper, J Rando
arXiv preprint arXiv:2311.03348, 2023
262023
Linearly Structured World Representations in Maze-Solving Transformers
M Ivanitskiy, AF Spies, T Räuker, G Corlouer, C Mathwin, L Quirke, ...
UniReps: the First Workshop on Unifying Representations in Neural Models, 2023
1*2023
A Configurable Library for Generating and Manipulating Maze Datasets
MI Ivanitskiy, R Shah, AF Spies, T Räuker, D Valentine, C Rager, L Quirke, ...
arXiv preprint arXiv:2309.10498, 2023
12023
The system can't perform the operation now. Try again later.
Articles 1–3