The advantage regret-matching actor-critic A Gruslys, M Lanctot, R Munos, F Timbers, M Schmid, J Perolat, D Morrill, ... arXiv preprint arXiv:2008.12234, 2020 | 24 | 2020 |
Learning to navigate wikipedia by taking random walks M Zaheer, K Marino, W Grathwohl, J Schultz, W Shang, S Babayan, ... Advances in Neural Information Processing Systems 35, 1529-1541, 2022 | 4 | 2022 |