Direct preference-based policy optimization without reward modeling G An, J Lee, X Zuo, N Kosaka, KM Kim, HO Song Advances in Neural Information Processing Systems 36, 70247-70266, 2023 | 18 | 2023 |
HyperCLOVA X Technical Report KM Yoo, J Han, S In, H Jeon, J Jeong, J Kang, H Kim, KM Kim, M Kim, ... arXiv preprint arXiv:2404.01954, 2024 | 3 | 2024 |