Neural trust region/proximal policy optimization attains globally optimal policy B Liu, Q Cai, Z Yang, Z Wang Advances in neural information processing systems 32, 2019 | 206* | 2019 |
Off-policy evaluation and learning from logged bandit feedback: Error reduction via surrogate policy Y Xie, B Liu, Q Liu, Z Wang, Y Zhou, J Peng International Conference on Learning Representations, 2018 | 20 | 2018 |
Inducing Equilibria via Incentives: Simultaneous Design-and-Play Ensures Global Convergence B Liu, J Li, Z Yang, HT Wai, M Hong, Y Nie, Z Wang Advances in Neural Information Processing Systems, 2022 | 12* | 2022 |
Differentiable bilevel programming for stackelberg congestion games J Li, J Yu, Q Wang, B Liu, Z Wang, YM Nie arXiv preprint arXiv:2209.07618, 2022 | 12 | 2022 |
Reason for future, act for now: A principled framework for autonomous llm agents with provable sample efficiency Z Liu, H Hu, S Zhang, H Guo, S Ke, B Liu, Z Wang arXiv preprint arXiv:2309.17382, 2023 | 10* | 2023 |
An analysis of attention via the lens of exchangeability and latent variable models Y Zhang, B Liu, Q Cai, L Wang, Z Wang arXiv preprint arXiv:2212.14852, 2022 | 7 | 2022 |
Relational Reasoning via Set Transformers: Provable Efficiency and Applications to MARL F Zhang, B Liu, K Wang, VYF Tan, Z Yang, Z Wang Advances in Neural Information Processing Systems, 2022 | 6 | 2022 |
Let models speak ciphers: Multiagent debate through embeddings C Pham, B Liu, Y Yang, Z Chen, T Liu, J Yuan, BA Plummer, Z Wang, ... arXiv preprint arXiv:2310.06272, 2023 | 5 | 2023 |
Policy Optimization in Zero-Sum Markov Games: Fictitious Self-Play Provably Attains Nash Equilibria B Liu, Z Yang, Z Wang | 3 | 2020 |
Model-based reparameterization policy gradient methods: Theory and practical algorithms S Zhang, B Liu, Z Wang, T Zhao Advances in Neural Information Processing Systems 36, 2024 | 2 | 2024 |
Achieving hierarchy-free approximation for bilevel programs with equilibrium constraints J Li, J Yu, B Liu, Y Nie, Z Wang International Conference on Machine Learning, 20312-20335, 2023 | 1 | 2023 |
Differentiable Arbitrating in Zero-sum Markov Games J Wang, M Song, F Gao, B Liu, Z Wang, Y Wu International Conference on Autonomous Agents and Multiagent Systems (AAMAS), 2023 | | 2023 |
Double Duality: Variational Primal-Dual Policy Optimization for Constrained Reinforcement Learning Z Li, B Liu, Z Yang, Z Wang, M Wang Journal of Machine Learning Research 24 (385), 1-43, 2023 | | 2023 |
BooVI: Provably Efficient Bootstrapped Value Iteration B Liu, Q Cai, Z Yang, Z Wang Advances in Neural Information Processing Systems 34, 7041-7053, 2021 | | 2021 |