Michal Valko

Cited by

	All	Since 2019
Citations	10887	10042
h-index	42	37
i10-index	97	91

3600

1800

900

2700

2011201220132014201520162017201820192020202120222023202436 26 63 61 108 141 165 199 318 608 1430 2771 3561 1343

Public access

View all

54 articles

0 articles

available

not available

Based on funding mandates

Co-authors

Rémi MunosDeepMindVerified email at inria.fr
Mohammad Gheshlaghi AzarCohereVerified email at google.com
Bilal PiotGoogle DeepmindVerified email at google.com
Corentin TallecDeepMindVerified email at google.com
Jean-bastien GrillVerified email at google.com
Zhaohan Daniel GuoDeepMindVerified email at google.com
Daniele CalandrielloResearch Scientist, DeepMindVerified email at google.com
Pierre MénardOvGU MagdeburgVerified email at inria.fr
Florent AltchéResearch Engineer, DeepMindVerified email at google.com
Alessandro LazaricResearch Scientist, Facebook Artificial Intelligence ResearchVerified email at inria.fr
Florian STRUBDeepMindVerified email at google.com
Pierre RichemondGoogle DeepMindVerified email at deepmind.com
Emilie KaufmannCNRS & Univ. Lille (CRIStAL)Verified email at inria.fr
Omar Darwiche DominguesOwkinVerified email at owkin.com
Branislav KvetonAmazonVerified email at amazon.com
Milos HauskrechtProfessor of Computer Science, University of PittsburghVerified email at pitt.edu
Yunhao TangResearch Scientist, DeepMindVerified email at columbia.edu
Mark RowlandResearch Scientist, Google DeepMindVerified email at google.com
Matteo PirottaResearch Scientist, Meta (FAIR)Verified email at fb.com
Carl DoerschResearch Scientist, DeepMindVerified email at google.com

Michal Valko

Llama @ Meta Paris & Inria & MVA - Ex: Gemini and BYOL @ Google DeepMind

Verified email at meta.com - Homepage

fine-tuning LLMs rl with human feedback deep reinforcement learning


Title Sort by citations Sort by year Sort by title	Cited by Cited by	Year
Human alignment of large language models through online preference optimisation D Calandriello, D Guo, R Munos, M Rowland, Y Tang, BA Pires, ... arXiv preprint arXiv:2403.08635, 2024		2024
Generalized preference optimization: A unified approach to offline alignment Y Tang, ZD Guo, Z Zheng, D Calandriello, R Munos, M Rowland, ... arXiv preprint arXiv:2402.05749, 2024	2	2024
Decoding-time realignment of language models T Liu, S Guo, L Bianco, D Calandriello, Q Berthet, F Llinares, J Hoffmann, ... arXiv preprint arXiv:2402.02992, 2024		2024
Nash learning from human feedback R Munos, M Valko, D Calandriello, MG Azar, M Rowland, D Guo, Y Tang, ... arXiv preprint arXiv:2312.00886, 2024	14	2024
Demonstration-regularized RL D Tiapkin, D Belomestny, D Calandriello, E Moulines, A Naumov, ... International Conference on Learning Representations, 2024	2*	2024
A general theoretical paradigm to understand learning from human preferences MG Azar, M Rowland, B Piot, D Guo, D Calandriello, M Valko, R Munos International Conference on Artificial Intelligence and Statistics, 2024	53	2024
Local and adaptive mirror descents in extensive-form games C Fiegel, P Ménard, T Kozuno, R Munos, V Perchet, M Valko arXiv preprint arXiv:2309.00656, 2024	1	2024
Unlocking the power of representations in long-term novelty-based exploration A Saade, S Kapturowski, D Calandriello, C Blundell, P Sprechmann, ... International Conference on Learning Representations, 2024	4	2024
Sharp deviations bounds for Dirichlet weighted sums with application to analysis of Bayesian algorithms D Belomestny, P Menard, A Naumov, D Tiapkin, M Valko arXiv preprint arXiv:2304.03056, 2024		2024
Model-free posterior sampling via learning rate randomization D Tiapkin, D Belomestny, D Calandriello, E Moulines, R Munos, ... Neural Information Processing Systems, 2023		2023
Half-Hop: A graph upsampling approach for slowing down message passing M Azabou, V Ganesh, S Thakoor, CH Lin, L Sathidevi, R Liu, M Valko, ... International Conference on Machine Learning, 2023	6	2023
Quantile credit assignment T Mesnard, W Chen, A Saade, Y Tang, M Rowland, T Weber, C Lyle, ... International Conference on Machine Learning, 2023	2	2023
Curiosity in hindsight: Intrinsic exploration in stochastic environments D Jarrett, C Tallec, F Altché, T Mesnard, R Munos, M Valko International Conference on Machine Learning, 2023	9	2023
Fast rates for maximum entropy exploration D Tiapkin, D Belomestny, D Calandriello, E Moulines, R Munos, ... International Conference on Machine Learning, 2023	7	2023
Middle-mile logistics through the lens of goal-conditioned reinforcement learning O Eberhard, T Cuvelier, M Valko, B De Backer NeurIPS 2023 Workshop: Goal-conditioned RL, 2023		2023
DoMo-AC: Doubly multi-step off-policy actor-critic algorithm Y Tang, T Kozuno, M Rowland, A Harutyunyan, R Munos, BÁ Pires, ... International Conference on Machine Learning, 2023		2023
VA-learning as a more efficient alternative to Q-learning Y Tang, R Munos, M Rowland, M Valko International Conference on Machine Learning, 2023	2	2023
Regularization and variance-weighted regression achieves minimax optimality in linear MDPs: Theory and practice T Kitamura, T Kozuno, Y Tang, N Vieillard, M Valko, W Yang, J Mei, ... International Conference on Machine Learning, 2023	2	2023
Adapting to game trees in zero-sum imperfect information games C Fiegel, P Ménard, T Kozuno, R Munos, V Perchet, M Valko International Conference on Machine Learning, 2023	4	2023
Understanding self-predictive learning for reinforcement learning Y Tang, ZD Guo, PH Richemond, BÁ Pires, Y Chandak, R Munos, ... International Conference on Machine Learning, 2023	20	2023

The system can't perform the operation now. Try again later.

Articles 1–20

Citations per year

Duplicate citations

Merged citations

Add co-authorsCo-authors

Follow

Cited by

Co-authors