Follow
Qi Cai
Title
Cited by
Cited by
Year
Provably efficient exploration in policy optimization
Q Cai, Z Yang, C Jin, Z Wang
International Conference on Machine Learning, 1283-1294, 2020
2802020
Neural policy gradient methods: Global optimality and rates of convergence
L Wang, Q Cai, Z Yang, Z Wang
International Conference on Learning Representations 2020, 2019
2382019
Neural Trust Region/Proximal Policy Optimization Attains Globally Optimal Policy
B Liu, Q Cai, Z Yang, Z Wang
Advances in Neural Information Processing Systems, 10564-10575, 2019
206*2019
Neural temporal-difference learning converges to global optima
Q Cai, Z Yang, JD Lee, Z Wang
Advances in Neural Information Processing Systems 32, 2019
134*2019
On the Global Optimality of Model-Agnostic Meta-Learning: Reinforcement Learning and Supervised Learning
L Wang, Q Cai, Z Yang, Z Wang
International Conference on Machine Learning, 9837-9846, 2020
402020
On the global convergence of imitation learning: A case for linear quadratic regulator
Q Cai, M Hong, Y Chen, Z Wang
arXiv preprint arXiv:1901.03674, 2019
342019
Generative adversarial imitation learning with neural network parameterization: Global optimality and convergence rate
Y Zhang, Q Cai, Z Yang, Z Wang
International conference on machine learning, 11044-11054, 2020
28*2020
Reinforcement learning from partial observation: Linear function approximation with provable sample efficiency
Q Cai, Z Yang, Z Wang
International Conference on Machine Learning, 2485-2522, 2022
24*2022
Embed to control partially observed systems: Representation learning with provable sample efficiency
L Wang, Q Cai, Z Yang, Z Wang
arXiv preprint arXiv:2205.13476, 2022
172022
Provably efficient offline reinforcement learning for partially observable markov decision processes
H Guo, Q Cai, Y Zhang, Z Yang, Z Wang
International Conference on Machine Learning, 8016-8038, 2022
152022
Can Temporal-Difference and Q-Learning Learn Representation? A Mean-Field Theory
Y Zhang, Q Cai, Z Yang, Y Chen, Z Wang
Advances in Neural Information Processing Systems 33, 19680-19692, 2020
152020
An analysis of attention via the lens of exchangeability and latent variable models
Y Zhang, B Liu, Q Cai, L Wang, Z Wang
arXiv preprint arXiv:2212.14852, 2022
72022
Neural temporal difference and q learning provably converge to global optima
Q Cai, Z Yang, JD Lee, Z Wang
Mathematics of Operations Research 49 (1), 619-651, 2024
42024
Optimistic policy optimization with general function approximations
Q Cai, Z Yang, C Szepesvari, Z Wang
22020
Represent to Control Partially Observed Systems: Representation Learning with Provable Sample Efficiency
L Wang, Q Cai, Z Yang, Z Wang
The Eleventh International Conference on Learning Representations, 0
1*
Provably Efficient Reinforcement Learning
Q Cai
Northwestern University, 2022
2022
BooVI: Provably Efficient Bootstrapped Value Iteration
B Liu, Q Cai, Z Yang, Z Wang
Advances in Neural Information Processing Systems 34, 7041-7053, 2021
2021
The system can't perform the operation now. Try again later.
Articles 1–17