Follow
Han Zhong
Han Zhong
Verified email at stu.pku.edu.cn - Homepage
Title
Cited by
Cited by
Year
GEC: A Unified Framework for Interactive Decision Making in MDP, POMDP, and Beyond
H Zhong, W Xiong, S Zheng, L Wang, Z Wang, Z Yang, T Zhang
arXiv preprint arXiv:2211.01962, 2022
50*2022
Human-in-the-loop: Provably Efficient Preference-based Reinforcement Learning with General Function Approximation
X Chen, H Zhong, Z Yang, Z Wang, L Wang
International Conference on Machine Learning, 3773-3793, 2022
442022
Nearly Minimax Optimal Offline Reinforcement Learning with Linear Function Approximation: Single-Agent MDP and Markov Game
W Xiong, H Zhong, C Shi, C Shen, L Wang, T Zhang
arXiv preprint arXiv:2205.15512, 2022
422022
Iterative Preference Learning from Human Feedback: Bridging Theory and Practice for RLHF under KL-constraint
W Xiong, H Dong, C Ye, Z Wang, H Zhong, H Ji, N Jiang, T Zhang
ICLR 2024 Workshop on Mathematical and Empirical Understanding of Foundation …, 2023
40*2023
Can Reinforcement Learning Find Stackelberg-Nash Equilibria in General-Sum Markov Games with Myopically Rational Followers?
H Zhong, Z Yang, Z Wang, MI Jordan
Journal of Machine Learning Research 24 (35), 1-52, 2023
40*2023
Pessimistic minimax value iteration: Provably efficient equilibrium learning from offline datasets
H Zhong, W Xiong, J Tan, L Wang, T Zhang, Z Wang, Z Yang
International Conference on Machine Learning, 27117-27142, 2022
382022
Maximize to Explore: One Objective Function Fusing Estimation, Planning, and Exploration
Z Liu, M Lu, W Xiong, H Zhong, H Hu, S Zhang, S Zheng, Z Yang, Z Wang
Thirty-seventh Conference on Neural Information Processing Systems, 2023
25*2023
A Self-Play Posterior Sampling Algorithm for Zero-Sum Markov Games
W Xiong, H Zhong, C Shi, C Shen, T Zhang
International Conference on Machine Learning, 24496-24523, 2022
252022
Why robust generalization in deep learning is difficult: Perspective of expressive power
B Li, J Jin, H Zhong, J Hopcroft, L Wang
Advances in Neural Information Processing Systems 35, 4370-4384, 2022
212022
A theoretical analysis of optimistic proximal policy optimization in linear markov decision processes
H Zhong, T Zhang
Advances in Neural Information Processing Systems 36, 2024
192024
Double pessimism is provably efficient for distributionally robust offline reinforcement learning: Generic algorithm and robust partial coverage
J Blanchet, M Lu, T Zhang, H Zhong
Advances in Neural Information Processing Systems 36, 2024
182024
Optimistic Policy Optimization is Provably Efficient in Non-stationary MDPs
H Zhong, Z Yang, Z Wang, C Szepesvári
arXiv preprint arXiv:2110.08984, 2021
182021
Nearly optimal policy optimization with stable at any time guarantee
T Wu, Y Yang, H Zhong, L Wang, S Du, J Jiao
International Conference on Machine Learning, 24243-24265, 2022
132022
DPO Meets PPO: Reinforced Token Optimization for RLHF
H Zhong, G Feng, W Xiong, L Zhao, D He, J Bian, L Wang
arXiv preprint arXiv:2404.18922, 2024
92024
Rewards-in-Context: Multi-objective Alignment of Foundation Models with Dynamic Preference Adjustment
R Yang, X Pan, F Luo, S Qiu, H Zhong, D Yu, J Chen
arXiv preprint arXiv:2402.10207, 2024
82024
Towards Robust Offline Reinforcement Learning under Diverse Data Corruption
R Yang, H Zhong, J Xu, A Zhang, C Zhang, L Han, T Zhang
arXiv preprint arXiv:2310.12955, 2023
72023
Tackling heavy-tailed rewards in reinforcement learning with function approximation: Minimax optimal and instance-dependent regret bounds
J Huang, H Zhong, L Wang, L Yang
Advances in Neural Information Processing Systems 36, 2024
62024
Provable Sim-to-real Transfer in Continuous Domain with Partial Observations
J Hu, H Zhong, C Jin, L Wang
arXiv preprint arXiv:2210.15598, 2022
62022
Breaking the Moments Condition Barrier: No-Regret Algorithm for Bandits with Super Heavy-Tailed Payoffs
H Zhong, J Huang, L Yang, L Wang
Advances in Neural Information Processing Systems 34, 2021
62021
A Reduction-Based Framework for Conservative Bandits and Reinforcement Learning
Y Yang, T Wu, H Zhong, E Garcelon, M Pirotta, A Lazaric, L Wang, SS Du
International Conference on Learning Representations, 2021/9/29, 2021
6*2021
The system can't perform the operation now. Try again later.
Articles 1–20