‪Meg Tong‬ - ‪Google Scholar‬

Get my own profile

Cited by

	All	Since 2019
Citations	182	182
h-index	5	5
i10-index	4	4

0

140

70

35

105

2023202457 123

Meg Tong

Meg Tong

Anthropic

Verified email at anthropic.com - Homepage

machine learning language model evaluation


Title Sort by citations Sort by year Sort by title	Cited by Cited by	Year
The Reversal Curse: LLMs trained on" A is B" fail to learn" B is A" L Berglund, M Tong, M Kaufmann, M Balesni, AC Stickland, T Korbak, ... arXiv preprint arXiv:2309.12288, 2023	82*	2023
Towards Understanding Sycophancy in Language Models M Sharma, M Tong, T Korbak, D Duvenaud, A Askell, SR Bowman, ... arXiv preprint arXiv:2310.13548, 2023	46	2023
Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training E Hubinger, C Denison, J Mu, M Lambert, M Tong, M MacDiarmid, ... arXiv preprint arXiv:2401.05566, 2024	25*	2024
Taken out of context: On measuring situational awareness in LLMs L Berglund, AC Stickland, M Balesni, M Kaufmann, M Tong, T Korbak, ... arXiv preprint arXiv:2309.00667, 2023	20*	2023
Steering Llama 2 via Contrastive Activation Addition N Rimsky, N Gabrieli, J Schulz, M Tong, E Hubinger, AM Turner arXiv preprint arXiv:2312.06681, 2023	8	2023
Many-shot Jailbreaking C Anil, E Durmus, M Sharma, J Benton, S Kundu, J Batson, N Rimsky, ...	1

The system can't perform the operation now. Try again later.

Articles 1–6