Follow
Rafael Mitkov Rafailov
Rafael Mitkov Rafailov
Graduate Student, Stanford University
Verified email at stanford.edu - Homepage
Title
Cited by
Cited by
Year
Direct preference optimization: Your language model is secretly a reward model
R Rafailov, A Sharma, E Mitchell, CD Manning, S Ermon, C Finn
Advances in Neural Information Processing Systems 36, 2024
4172024
Combo: Conservative offline model-based policy optimization
T Yu, A Kumar, R Rafailov, A Rajeswaran, S Levine, C Finn
Advances in neural information processing systems 34, 28954-28967, 2021
3122021
Offline reinforcement learning from images with latent space models
R Rafailov, T Yu, A Rajeswaran, C Finn
Learning for dynamics and control, 1154-1168, 2021
1102021
Offline meta-reinforcement learning with advantage weighting
E Mitchell, R Rafailov, XB Peng, S Levine, C Finn
International Conference on Machine Learning, 7780-7791, 2021
932021
Open x-embodiment: Robotic learning datasets and rt-x models
A Padalkar, A Pooley, A Jain, A Bewley, A Herzog, A Irpan, A Khazatsky, ...
arXiv preprint arXiv:2310.08864, 2023
662023
Just ask for calibration: Strategies for eliciting calibrated confidence scores from language models fine-tuned with human feedback
K Tian, E Mitchell, A Zhou, A Sharma, R Rafailov, H Yao, C Finn, ...
arXiv preprint arXiv:2305.14975, 2023
582023
Visual adversarial imitation learning using variational models
R Rafailov, T Yu, A Rajeswaran, C Finn
Advances in Neural Information Processing Systems 34, 3016-3028, 2021
342021
Vision-based manipulators need to also see from their hands
K Hsu, MJ Kim, R Rafailov, J Wu, C Finn
arXiv preprint arXiv:2203.12677, 2022
262022
On the sum of powered distances to certain sets of points on the circle
N Nikolov, R Rafailov
Pacific journal of mathematics 253 (1), 157-168, 2011
232011
On extremums of sums of powered distances to a finite set of points
N Nikolov, R Rafailov
Geometriae Dedicata 167 (1), 69-89, 2013
182013
Contrastive prefence learning: Learning from human feedback without rl
J Hejna, R Rafailov, H Sikchi, C Finn, S Niekum, WB Knox, D Sadigh
arXiv preprint arXiv:2310.13639, 2023
112023
Diffusion model alignment using direct preference optimization
B Wallace, M Dang, R Rafailov, L Zhou, A Lou, S Purushwalkam, S Ermon, ...
arXiv preprint arXiv:2311.12908, 2023
102023
An emulator for fine-tuning large language models using small language models
E Mitchell, R Rafailov, A Sharma, C Finn, CD Manning
arXiv preprint arXiv:2310.12962, 2023
82023
Open x-embodiment: Robotic learning datasets and RT-x models
Q Vuong, S Levine, HR Walke, K Pertsch, A Singh, R Doshi, C Xu, J Luo, ...
Towards Generalist Robots: Learning Paradigms for Scalable Skill Acquisition …, 2023
52023
MOTO: Offline pre-training to online fine-tuning for model-based robot learning
R Rafailov, KB Hatch, V Kolev, JD Martin, M Phielipp, C Finn
Conference on Robot Learning, 3654-3671, 2023
32023
Offline retraining for online rl: Decoupled policy learning to mitigate exploration bias
MS Mark, A Sharma, F Tajwar, R Rafailov, S Levine, C Finn
arXiv preprint arXiv:2310.08558, 2023
22023
Example-based offline reinforcement learning without rewards
K Hatch, T Yu, R Rafailov, C Finn
Proceedings of Machine Learning Research vol 144, 1-17, 2022
22022
Language Model Detectors Are Easily Optimized Against
C Nicks, E Mitchell, R Rafailov, A Sharma, CD Manning, C Finn, S Ermon
The Twelfth International Conference on Learning Representations, 2023
12023
MOTO: Offline to Online Fine-tuning for Model-Based Reinforcement Learning
R Rafailov, KB Hatch, V Kolev, JD Martin, M Phielipp, C Finn
Workshop on Reincarnating Reinforcement Learning at ICLR 2023, 2023
12023
The Reflective Explorer: Online Meta-Exploration from Offline Data in Realistic Robotic Tasks
R Rafailov, VK Vijay, T Yu, A Singh, M Phielipp, C Finn
Deep RL Workshop NeurIPS 2021, 2021
12021
The system can't perform the operation now. Try again later.
Articles 1–20