Deepnet: Scaling transformers to 1,000 layers H Wang, S Ma, L Dong, S Huang, D Zhang, F Wei IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024 | 112 | 2024 |
Magneto: A Foundation Transformer Hongyu Wang, Shuming Ma, Shaohan Huang, Li Dong, Wenhui Wang, Zhiliang Peng ... International Conference on Machine Learning, 2023 | 29* | 2023 |
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits S Ma, H Wang, L Ma, L Wang, W Wang, S Huang, L Dong, R Wang, J Xue, ... arXiv preprint arXiv:2402.17764, 2024 | 13 | 2024 |
TorchScale: Transformers at scale S Ma, H Wang, S Huang, W Wang, Z Chi, L Dong, A Benhaim, B Patra, ... arXiv preprint arXiv:2211.13184, 2022 | 11 | 2022 |
Bitnet: Scaling 1-bit transformers for large language models H Wang, S Ma, L Dong, S Huang, H Wang, L Ma, F Yang, R Wang, Y Wu, ... arXiv preprint arXiv:2310.11453, 2023 | 10 | 2023 |