Follow
Michal Valko
Michal Valko
Llama @ Meta Paris & Inria & MVA - Ex: Gemini and BYOL @ Google DeepMind
Verified email at meta.com - Homepage
Title
Cited by
Year
Human alignment of large language models through online preference optimisation
D Calandriello, D Guo, R Munos, M Rowland, Y Tang, BA Pires, ...
arXiv preprint arXiv:2403.08635, 2024
2024
Generalized preference optimization: A unified approach to offline alignment
Y Tang, ZD Guo, Z Zheng, D Calandriello, R Munos, M Rowland, ...
arXiv preprint arXiv:2402.05749, 2024
22024
Decoding-time realignment of language models
T Liu, S Guo, L Bianco, D Calandriello, Q Berthet, F Llinares, J Hoffmann, ...
arXiv preprint arXiv:2402.02992, 2024
2024
Nash learning from human feedback
R Munos, M Valko, D Calandriello, MG Azar, M Rowland, D Guo, Y Tang, ...
arXiv preprint arXiv:2312.00886, 2024
142024
Demonstration-regularized RL
D Tiapkin, D Belomestny, D Calandriello, E Moulines, A Naumov, ...
International Conference on Learning Representations, 2024
2*2024
A general theoretical paradigm to understand learning from human preferences
MG Azar, M Rowland, B Piot, D Guo, D Calandriello, M Valko, R Munos
International Conference on Artificial Intelligence and Statistics, 2024
532024
Local and adaptive mirror descents in extensive-form games
C Fiegel, P Ménard, T Kozuno, R Munos, V Perchet, M Valko
arXiv preprint arXiv:2309.00656, 2024
12024
Unlocking the power of representations in long-term novelty-based exploration
A Saade, S Kapturowski, D Calandriello, C Blundell, P Sprechmann, ...
International Conference on Learning Representations, 2024
42024
Sharp deviations bounds for Dirichlet weighted sums with application to analysis of Bayesian algorithms
D Belomestny, P Menard, A Naumov, D Tiapkin, M Valko
arXiv preprint arXiv:2304.03056, 2024
2024
Model-free posterior sampling via learning rate randomization
D Tiapkin, D Belomestny, D Calandriello, E Moulines, R Munos, ...
Neural Information Processing Systems, 2023
2023
Half-Hop: A graph upsampling approach for slowing down message passing
M Azabou, V Ganesh, S Thakoor, CH Lin, L Sathidevi, R Liu, M Valko, ...
International Conference on Machine Learning, 2023
62023
Quantile credit assignment
T Mesnard, W Chen, A Saade, Y Tang, M Rowland, T Weber, C Lyle, ...
International Conference on Machine Learning, 2023
22023
Curiosity in hindsight: Intrinsic exploration in stochastic environments
D Jarrett, C Tallec, F Altché, T Mesnard, R Munos, M Valko
International Conference on Machine Learning, 2023
92023
Fast rates for maximum entropy exploration
D Tiapkin, D Belomestny, D Calandriello, E Moulines, R Munos, ...
International Conference on Machine Learning, 2023
72023
Middle-mile logistics through the lens of goal-conditioned reinforcement learning
O Eberhard, T Cuvelier, M Valko, B De Backer
NeurIPS 2023 Workshop: Goal-conditioned RL, 2023
2023
DoMo-AC: Doubly multi-step off-policy actor-critic algorithm
Y Tang, T Kozuno, M Rowland, A Harutyunyan, R Munos, BÁ Pires, ...
International Conference on Machine Learning, 2023
2023
VA-learning as a more efficient alternative to Q-learning
Y Tang, R Munos, M Rowland, M Valko
International Conference on Machine Learning, 2023
22023
Regularization and variance-weighted regression achieves minimax optimality in linear MDPs: Theory and practice
T Kitamura, T Kozuno, Y Tang, N Vieillard, M Valko, W Yang, J Mei, ...
International Conference on Machine Learning, 2023
22023
Adapting to game trees in zero-sum imperfect information games
C Fiegel, P Ménard, T Kozuno, R Munos, V Perchet, M Valko
International Conference on Machine Learning, 2023
42023
Understanding self-predictive learning for reinforcement learning
Y Tang, ZD Guo, PH Richemond, BÁ Pires, Y Chandak, R Munos, ...
International Conference on Machine Learning, 2023
202023
The system can't perform the operation now. Try again later.
Articles 1–20