Nicholas Schiefer

Cited by

	All	Since 2019
Citations	2162	2150
h-index	17	17
i10-index	19	19

1300

650

325

975

2019202020212022202320246 7 11 41 850 1227

Public access

View all

5 articles

0 articles

available

not available

Based on funding mandates

Co-authors

Zac Hatfield DoddsAnthropic; Australian National UniversityVerified email at anu.edu.au
Jared KaplanJohns Hopkins University & AnthropicVerified email at pha.jhu.edu
Carol ChenMember of Technical StaffVerified email at anthropic.com
Christopher OlahAnthropicVerified email at google.com
Robert LasenbyStanford UniversityVerified email at stanford.edu
Dario AmodeiCEO and Co-Founder at AnthropicVerified email at anthropic.com
Catherine OlssonAnthropicVerified email at mit.edu
Dawn DrainMicrosoftVerified email at microsoft.com
Roger GrosseAssociate Professor, University of TorontoVerified email at cs.toronto.edu
Erik WinfreeCalifornia Institute of TechnologyVerified email at caltech.edu
Shyam NarayananPhD Student, MITVerified email at mit.edu
Piotr IndykProfessor of Electrical Engineering and Computer Science, MITVerified email at mit.edu
Kfir Lev-AriAppleVerified email at alumni.technion.ac.il
Tao LinMeta Platforms, Inc.Verified email at fb.com
Anders AamandUniversity of CopenhagenVerified email at mit.edu
Ronitt RubinfeldProfessor of Computer Science, MIT and Tel Aviv UniversityVerified email at csail.mit.edu
Helen XuGeorgia Institute of TechnologyVerified email at gatech.edu
Daniel JacksonMITVerified email at mit.edu
Geoffrey LittPhD Student, MITVerified email at mit.edu
Alexander Shraer

Nicholas Schiefer

Anthropic

Verified email at mit.edu


Title Sort by citations Sort by year Sort by title	Cited by Cited by	Year
Constitutional ai: Harmlessness from ai feedback Y Bai, S Kadavath, S Kundu, A Askell, J Kernion, A Jones, A Chen, ... arXiv preprint arXiv:2212.08073, 2022	747	2022
Red teaming language models to reduce harms: Methods, scaling behaviors, and lessons learned D Ganguli, L Lovitt, J Kernion, A Askell, Y Bai, S Kadavath, B Mann, ... arXiv preprint arXiv:2209.07858, 2022	290	2022
Toy models of superposition N Elhage, T Hume, C Olsson, N Schiefer, T Henighan, S Kravec, ... arXiv preprint arXiv:2209.10652, 2022	169	2022
Discovering language model behaviors with model-written evaluations E Perez, S Ringer, K Lukošiūtė, K Nguyen, E Chen, S Heiner, C Pettit, ... arXiv preprint arXiv:2212.09251, 2022	161	2022
The capacity for moral self-correction in large language models D Ganguli, A Askell, N Schiefer, TI Liao, K Lukošiūtė, A Chen, A Goldie, ... arXiv preprint arXiv:2302.07459, 2023	112	2023
Language models (mostly) know what they know S Kadavath, T Conerly, A Askell, T Henighan, D Drain, E Perez, ... arXiv preprint arXiv:2207.05221, 2022	104	2022
Towards monosemanticity: Decomposing language models with dictionary learning T Bricken, A Templeton, J Batson, B Chen, A Jermyn, T Conerly, N Turner, ... Transformer Circuits Thread 2, 2023	102	2023
Towards measuring the representation of subjective global opinions in language models E Durmus, K Nguyen, TI Liao, N Schiefer, A Askell, A Bakhtin, C Chen, ... arXiv preprint arXiv:2306.16388, 2023	95	2023
Towards understanding sycophancy in language models M Sharma, M Tong, T Korbak, D Duvenaud, A Askell, SR Bowman, ... arXiv preprint arXiv:2310.13548, 2023	67	2023
Measuring progress on scalable oversight for large language models SR Bowman, J Hyun, E Perez, E Chen, C Pettit, S Heiner, K Lukošiūtė, ... arXiv preprint arXiv:2211.03540, 2022	58	2022
Measuring faithfulness in chain-of-thought reasoning T Lanham, A Chen, A Radhakrishnan, B Steiner, C Denison, ... arXiv preprint arXiv:2307.13702, 2023	54	2023
Question decomposition improves the faithfulness of model-generated reasoning A Radhakrishnan, K Nguyen, A Chen, C Chen, C Denison, D Hernandez, ... arXiv preprint arXiv:2307.11768, 2023	36	2023
Universal Computation and Optimal Construction in the Chemical Reaction Network-Controlled Tile Assembly Model N Schiefer, E Winfree 21st International Conference on DNA Computing and Molecular Programming …, 2015	26	2015
Sleeper agents: Training deceptive llms that persist through safety training E Hubinger, C Denison, J Mu, M Lambert, M Tong, M MacDiarmid, ... arXiv preprint arXiv:2401.05566, 2024	24	2024
FoundationDB Record Layer: A Multi-Tenant Structured Datastore C Chrysafis, B Collins, S Dugas, J Dunkelberger, M Ehsan, S Gray, ... Proceedings of the 2019 International Conference on Management of Data, 1787 …, 2019	23	2019
Superposition, memorization, and double descent T Henighan, S Carter, T Hume, N Elhage, R Lasenby, S Fort, N Schiefer, ... Transformer Circuits Thread 6, 24, 2023	18	2023
Exponentially improving the complexity of simulating the Weisfeiler-Lehman test with graph neural networks A Aamand, J Chen, P Indyk, S Narayanan, R Rubinfeld, N Schiefer, ... Advances in Neural Information Processing Systems 35, 27333-27346, 2022	17	2022
Many-shot jailbreaking C Anil, E Durmus, M Sharma, J Benton, S Kundu, J Batson, N Rimsky, ... Anthropic, April, 2024	12	2024
Specific versus general principles for constitutional ai S Kundu, Y Bai, S Kadavath, A Askell, A Callahan, A Chen, A Goldie, ... arXiv preprint arXiv:2310.13798, 2023	12	2023
Red teaming language models to reduce harms: Methods, scaling behaviors, and lessons learned. arXiv D Ganguli, L Lovitt, J Kernion, A Askell, Y Bai, S Kadavath, B Mann, ...	9	2022

The system can't perform the operation now. Try again later.

Articles 1–20

Citations per year

Duplicate citations

Merged citations

Add co-authorsCo-authors

Follow

Cited by

Co-authors