ParaCrawl: Web-scale acquisition of parallel corpora M Bañón, P Chen, B Haddow, K Heafield, H Hoang, M Esplà-Gomis, ... Association for Computational Linguistics (ACL), 2020 | 204 | 2020 |
Bifixer and bicleaner: two open-source tools to clean your parallel data G Ramírez‐Sánchez, J Zaragoza-Bernabeu, M Bañón, S Ortiz-Rojas Proceedings of the 22nd Annual Conference of the European Association for …, 2020 | 35 | 2020 |
Multilingual comparable corpora of parliamentary debates ParlaMint 2.1 T Erjavec, M Ogrodniczuk, P Osenova, N Ljubešić, K Simov, V Grigorova, ... CLARIN ERIC, 2021 | 15 | 2021 |
MaCoCu: Massive collection and curation of monolingual and bilingual data: focus on under-resourced languages M Banón, M Espla-Gomis, ML Forcada, C García-Romero, T Kuzman, ... 23rd Annual Conference of the European Association for Machine Translation …, 2022 | 11 | 2022 |
Bicleaner at WMT 2020: Universitat d’Alacant-Prompsit’s submission to the parallel corpus filtering shared task M Espla-Gomis, VM Sánchez-Cartagena, J Zaragoza-Bernabeu, ... Proceedings of the fifth conference on machine translation, 952-958, 2020 | 11 | 2020 |
Bicleaner AI: Bicleaner goes neural J Zaragoza-Bernabeu, G Ramírez‐Sánchez, M Bañón, S Ortiz-Rojas Proceedings of the Thirteenth Language Resources and Evaluation Conference …, 2022 | 8 | 2022 |
Slovene-English parallel corpus MaCoCu-sl-en 2.0 M Bañón, M Chichirau, M Esplà-Gomis, ML Forcada, A Galiano-Jiménez, ... Jožef Stefan Institute, 2023 | 4 | 2023 |
HPLT: High Performance Language Technologies M Aulamo, N Bogoychev, S Ji, G Nail, G Ramírez‐Sánchez, J Tiedemann, ... Proceedings of the 24th Annual Conference of the European Association for …, 2023 | 2 | 2023 |
Croatian-English parallel corpus MaCoCu-hr-en 2.0 M Bañón, M Chichirau, M Esplà-Gomis, ML Forcada, A Galiano-Jiménez, ... Jožef Stefan Institute, 2023 | 2 | 2023 |
Serbian-English parallel corpus MaCoCu-sr-en 1.0 M Bañón, M Chichirau, M Esplà-Gomis, ML Forcada, A Galiano-Jiménez, ... Jožef Stefan Institute, 2023 | 2* | 2023 |
FastSpell: the LangId Magic Spell M Bañón, J Zaragoza-Bernabeu, G Ramírez-Sánchez, S Ortiz-Rojas arXiv preprint arXiv:2404.08345, 2024 | 1 | 2024 |
OpusCleaner and OpusTrainer, open source toolkits for training Machine Translation and Large language models N Bogoychev, J van der Linde, G Nail, B Haddow, J Zaragoza-Bernabeu, ... arXiv preprint arXiv:2311.14838, 2023 | 1 | 2023 |
Bulgarian-English parallel corpus MaCoCu-bg-en 2.0 M Bañón, M Chichirau, M Esplà-Gomis, ML Forcada, A Galiano-Jiménez, ... Jožef Stefan Institute, 2023 | 1 | 2023 |
Human evaluation of web-crawled parallel corpora for machine translation G Ramírez‐Sánchez, M Bañón, J Zaragoza-Bernabeu, S Ortiz-Rojas Proceedings of the 2nd Workshop on Human Evaluation of NLP Systems (HumEval …, 2022 | 1 | 2022 |
A New Massive Multilingual Dataset for High-Performance Language Technologies O de Gibert, G Nail, N Arefyev, M Bañón, J van der Linde, S Ji, ... arXiv preprint arXiv:2403.14009, 2024 | | 2024 |
Ukrainian-English parallel corpus MaCoCu-uk-en 1.0 M Bañón, M Chichirau, M Esplà-Gomis, ML Forcada, A Galiano-Jiménez, ... Jožef Stefan Institute, 2023 | | 2023 |
Albanian-English parallel corpus MaCoCu-sq-en 1.0 M Bañón, M Chichirau, M Esplà-Gomis, ML Forcada, A Galiano-Jiménez, ... Jožef Stefan Institute, 2023 | | 2023 |
Icelandic-English parallel corpus MaCoCu-is-en 2.0 M Bañón, M Chichirau, M Esplà-Gomis, ML Forcada, A Galiano-Jiménez, ... Jožef Stefan Institute, 2023 | | 2023 |
Turkish-English parallel corpus MaCoCu-tr-en 2.0 M Bañón, M Chichirau, M Esplà-Gomis, ML Forcada, A Galiano-Jiménez, ... Jožef Stefan Institute, 2023 | | 2023 |
Maltese-English parallel corpus MaCoCu-mt-en 2.0 M Bañón, M Chichirau, M Esplà-Gomis, ML Forcada, A Galiano-Jiménez, ... Jožef Stefan Institute, 2023 | | 2023 |