State-wise safe reinforcement learning with pixel observations SS Zhan, Y Wang, Q Wu, R Jiao, C Huang, Q Zhu 6th Learning for Dynamics & Control Conference (L4DC 2024), 2023 | 10 | 2023 |
Highway reinforcement learning Y Wang, M Strupl, F Faccio, Q Wu, H Liu, M Grudzień, X Tan, ... arXiv preprint arXiv:2405.18289, 2024 | 2 | 2024 |
Boosting Reinforcement Learning with Strongly Delayed Feedback Through Auxiliary Short Delays Q Wu, SS Zhan, Y Wang, Y Wang, CW Lin, C Lv, Q Zhu, J Schmidhuber, ... Forty-first International Conference on Machine Learning (ICML 2024), 2024 | 2* | 2024 |
Variational Delayed Policy Optimization Q Wu, SS Zhan, Y Wang, Y Wang, CW Lin, C Lv, Q Zhu, C Huang arXiv preprint arXiv:2405.14226, 2024 | 1 | 2024 |
Highway Value Iteration Networks Y Wang, W Li, F Faccio, Q Wu, J Schmidhuber Forty-first International Conference on Machine Learning (ICML 2024), 2024 | 1 | 2024 |
Greedy-Step Off-Policy Reinforcement Learning Y Wang, Q Wu, P He, X Tan arXiv preprint arXiv:2102.11717, 2021 | 1 | 2021 |
Model-Based Reward Shaping for Adversarial Inverse Reinforcement Learning in Stochastic Environments SS Zhan, Q Wu, P Wang, Y Wang, R Jiao, C Huang, Q Zhu arXiv preprint arXiv:2410.03847, 2024 | | 2024 |
Scaling Value Iteration Networks to 5000 Layers for Extreme Long-Term Planning Y Wang, Q Wu, W Li, DR Ashley, F Faccio, C Huang, J Schmidhuber arXiv preprint arXiv:2406.08404, 2024 | | 2024 |
Learning Downstream Task by Selectively Capturing Complementary Knowledge from Multiple Self-supervisedly Learning Pretexts J Yao, Q Wu, Q Feng, S Chen arXiv preprint arXiv:2204.05248, 2022 | | 2022 |
Expected-Max Ensembled Q-learning with Temporally-Varying Exploration Q Wu, Y Wang | | 2022 |