Moe-mamba: Efficient selective state space models with mixture of experts M Pióro, K Ciebiera, K Król, J Ludziejewski, M Krutul, J Krajewski, ... arXiv preprint arXiv:2401.04081, 2024 | 14 | 2024 |
Scaling Laws for Fine-Grained Mixture of Experts J Krajewski, J Ludziejewski, K Adamczewski, M Pióro, M Krutul, ... arXiv preprint arXiv:2402.07871, 2024 | 1 | 2024 |
Mixture of Tokens: Efficient LLMs through Cross-Example Aggregation S Antoniak, S Jaszczur, M Krutul, M Pióro, J Krajewski, J Ludziejewski, ... arXiv preprint arXiv:2310.15961, 2023 | | 2023 |