Follow
Tony Wang
Tony Wang
PhD student, MIT
Verified email at mit.edu - Homepage
Title
Cited by
Cited by
Year
Open problems and fundamental limitations of reinforcement learning from human feedback
S Casper, X Davies, C Shi, TK Gilbert, J Scheurer, J Rando, R Freedman, ...
arXiv preprint arXiv:2307.15217, 2023
1582023
Adversarial Policies Beat Superhuman Go AIs
TT Wang, A Gleave, N Belrose, T Tseng, J Miller, MD Dennis, Y Duan, ...
arXiv preprint arXiv:2211.00241, 2022
38*2022
Neural-guided, bidirectional program search for abstraction and reasoning
S Alford, A Gandhi, A Rangamani, A Banburski, T Wang, S Dandekar, ...
Complex Networks & Their Applications X: Volume 1, Proceedings of the Tenth …, 2022
142022
SDP Methods for Sensitivity-Constrained Privacy Funnel and Information Bottleneck Problems
Y Bu, T Wang, GW Wornell
2021 IEEE International Symposium on Information Theory (ISIT), 49-54, 2021
52021
Forbidden Facts: An Investigation of Competing Objectives in Llama-2
TT Wang, M Wang, K Hariharan, N Shavit
arXiv preprint arXiv:2312.08793, 2023
2023
Cliff-Learning
TT Wang, I Zablotchi, N Shavit, JS Rosenfeld
arXiv preprint arXiv:2302.07348, 2023
2023
Adversarial Examples in Simpler Settings
TT Wang
Massachusetts Institute of Technology, 2021
2021
The system can't perform the operation now. Try again later.
Articles 1–7