Follow
Tony Wang
Tony Wang
PhD student, MIT
Verified email at mit.edu - Homepage
Title
Cited by
Cited by
Year
Open problems and fundamental limitations of reinforcement learning from human feedback
S Casper, X Davies, C Shi, TK Gilbert, J Scheurer, J Rando, R Freedman, ...
arXiv preprint arXiv:2307.15217, 2023
2592023
Adversarial Policies Beat Superhuman Go AIs
TT Wang, A Gleave, N Belrose, T Tseng, J Miller, MD Dennis, Y Duan, ...
arXiv preprint arXiv:2211.00241, 2022
45*2022
Neural-guided, bidirectional program search for abstraction and reasoning
S Alford, A Gandhi, A Rangamani, A Banburski, T Wang, S Dandekar, ...
Complex Networks & Their Applications X: Volume 1, Proceedings of the Tenth …, 2022
152022
SDP Methods for Sensitivity-Constrained Privacy Funnel and Information Bottleneck Problems
Y Bu, T Wang, GW Wornell
2021 IEEE International Symposium on Information Theory (ISIT), 49-54, 2021
62021
Forbidden Facts: An Investigation of Competing Objectives in Llama-2
TT Wang, M Wang, K Hariharan, N Shavit
arXiv preprint arXiv:2312.08793, 2023
22023
Covert Malicious Finetuning: Challenges in Safeguarding LLM Adaptation
D Halawi, A Wei, E Wallace, TT Wang, N Haghtalab, J Steinhardt
arXiv preprint arXiv:2406.20053, 2024
2024
Can Go AIs be adversarially robust?
T Tseng, E McLean, K Pelrine, TT Wang, A Gleave
arXiv preprint arXiv:2406.12843, 2024
2024
A connectomics-driven analysis reveals novel characterization of border regions in mouse visual cortex
N Tumma, L Kong, S Sawmya, TT Wang, N Shavit
bioRxiv, 2024.05. 24.595837, 2024
2024
Cliff-Learning
TT Wang, I Zablotchi, N Shavit, JS Rosenfeld
arXiv preprint arXiv:2302.07348, 2023
2023
Adversarial Examples in Simpler Settings
TT Wang
Massachusetts Institute of Technology, 2021
2021
The system can't perform the operation now. Try again later.
Articles 1–10