Sparseadapter: An easy approach for improving the parameter-efficiency of adapters S He, L Ding, D Dong, M Zhang, D Tao arXiv preprint arXiv:2210.04284, 2022 | 56 | 2022 |
LLaMA-MoE: Building Mixture-of-Experts from LLaMA with Continual Pre-training LME Team https://github.com/pjlab-sys4nlp/llama-moe, 2023 | 6 | 2023 |
PAD-net: An efficient framework for dynamic networks S He, L Ding, D Dong, B Liu, F Yu, D Tao arXiv preprint arXiv:2211.05528, 2022 | 5 | 2022 |
SD-Conv: Towards the Parameter-Efficiency of Dynamic Convolution S He, C Jiang, D Dong, L Ding IEEE/CVF Winter Conference on Applications of Computer Vision, 2023 (WACV 2023)., 2022 | 3 | 2022 |
Demystifying the Compression of Mixture-of-Experts Through a Unified Framework S He, D Dong, L Ding, A Li arXiv preprint arXiv:2406.02500, 2024 | 2 | 2024 |
Dynamic Data Mixing Maximizes Instruction Tuning for Mixture-of-Experts T Zhu, D Dong, X Qu, J Ruan, W Chen, Y Cheng arXiv preprint arXiv:2406.11256, 2024 | 1 | 2024 |
DLO: Dynamic Layer Operation for Efficient Vertical Scaling of LLMs Z Tan, D Dong, X Zhao, J Peng, Y Cheng, T Chen arXiv preprint arXiv:2407.11030, 2024 | | 2024 |
LLaMA-MoE: Building Mixture-of-Experts from LLaMA with Continual Pre-training T Zhu, X Qu, D Dong, J Ruan, J Tong, C He, Y Cheng arXiv preprint arXiv:2406.16554, 2024 | | 2024 |
iDAT: inverse Distillation Adapter-Tuning J Ruan, J Gao, M Xie, D Dong, S Xiang, T Liu, Y Fu arXiv preprint arXiv:2403.15750, 2024 | | 2024 |
A Graph is Worth Words: Euclideanizing Graph using Pure Transformer Z Gao, D Dong, C Tan, J Xia, B Hu, SZ Li arXiv preprint arXiv:2402.02464, 2024 | | 2024 |