Maximize to Explore: One Objective Function Fusing Estimation, Planning, and Exploration Z Liu, M Lu, W Xiong, H Zhong, H Hu, S Zhang, S Zheng, Z Yang, Z Wang Advances in Neural Information Processing Systems (NeurIPS), 2023 | 26* | 2023 |
Reason for Future, Act for Now: A Principled Framework for Autonomous LLM Agents with Provable Sample Efficiency Z Liu, H Hu, S Zhang, H Guo, S Ke, B Liu, Z Wang arXiv preprint arXiv:2309.17382, 2023 | 21* | 2023 |
Learning Meta Representations for Agents in Multi-Agent Reinforcement Learning S Zhang, L Shen, L Han, L Shen Conference on Lifelong Learning Agents (CoLLAs), 2021 | 7 | 2021 |
Asking Before Action: Gather Information in Embodied Decision Making with Language Models X Chen, S Zhang, P Zhang, L Zhao, J Chen arXiv preprint arXiv:2305.15695, 2023 | 6 | 2023 |
Conservative Dual Policy Optimization for Efficient Model-Based Reinforcement Learning S Zhang Advances in Neural Information Processing Systems (NeurIPS), 2022 | 6 | 2022 |
Provably Mitigating Overoptimization in RLHF: Your SFT Loss is Implicitly an Adversarial Regularizer Z Liu, M Lu, S Zhang, B Liu, H Guo, Y Yang, J Blanchet, Z Wang arXiv preprint arXiv:2405.16436, 2024 | 3 | 2024 |
Adaptive Barrier Smoothing for First-Order Policy Gradient with Contact Dynamics S Zhang, W Jin, Z Wang International Conference on Machine Learning (ICML), 2023 | 3 | 2023 |
Model-Based Reparameterization Policy Gradient Methods: Theory and Practical Algorithms S Zhang, B Liu, Z Wang, T Zhao Advances in Neural Information Processing Systems (NeurIPS), 2023 | 2 | 2023 |
Self-Exploring Language Models: Active Preference Elicitation for Online Alignment S Zhang, D Yu, H Sharma, Z Yang, S Wang, H Hassan, Z Wang arXiv preprint arXiv:2405.19332, 2024 | 1 | 2024 |
How Can LLM Guide RL? A Value-Based Approach S Zhang, S Zheng, S Ke, Z Liu, W Jin, J Yuan, Y Yang, H Yang, Z Wang arXiv preprint arXiv:2402.16181, 2024 | 1 | 2024 |
Structure-Regularized Attention for Deformable Object Representation S Zhang, L Shen, Z Li, W Liu NeurIPS 2020 Workshop on Object Representations for Learning and Reasoning, 2021 | 1 | 2021 |