MaCoCu: Massive collection and curation of monolingual and bilingual data: focus on under-resourced languages M Banón, M Espla-Gomis, ML Forcada, C García-Romero, T Kuzman, ... 23rd Annual Conference of the European Association for Machine Translation …, 2022 | 11 | 2022 |
The GINCO training dataset for web genre identification of documents out in the wild T Kuzman, P Rupnik, N Ljubešić arXiv preprint arXiv:2201.03857, 2022 | 10 | 2022 |
ParlaSpeech-HR-a freely available ASR dataset for croatian bootstrapped from the parlaMint corpus N Ljubešić, D Koržinek, P Rupnik, IP Jazbec Proceedings of the workshop ParlaCLARIN III within the 13th language …, 2022 | 9 | 2022 |
Multilingual comparable corpora of parliamentary debates ParlaMint 3.0 T Erjavec, M Kopp, M Ogrodniczuk, P Osenova, D Fišer, H Pirker, T Wissik, ... CLARIN ERIC, 2023 | 5 | 2023 |
The ParlaSent-BCS dataset of sentiment-annotated parliamentary debates from Bosnia-Herzegovina, Croatia, and Serbia M Mochtak, P Rupnik, N Ljubešič arXiv preprint arXiv:2206.00929, 2022 | 5 | 2022 |
Get to Know Your Parallel Data: Performing English Variety and Genre Classification over MaCoCu Corpora T Kuzman, P Rupnik, N Ljubešić Tenth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial …, 2023 | 4 | 2023 |
Slovene-English parallel corpus MaCoCu-sl-en 2.0 M Bañón, M Chichirau, M Esplà-Gomis, ML Forcada, A Galiano-Jiménez, ... Jožef Stefan Institute, 2023 | 4 | 2023 |
BENCHić-lang: A Benchmark for Discriminating between Bosnian, Croatian, Montenegrin and Serbian P Rupnik, T Kuzman, N Ljubešić Tenth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial …, 2023 | 2 | 2023 |
Serbian-English parallel corpus MaCoCu-sr-en 1.0 M Bañón, M Chichirau, M Esplà-Gomis, ML Forcada, A Galiano-Jiménez, ... Jožef Stefan Institute, 2023 | 2* | 2023 |
Croatian-English parallel corpus MaCoCu-hr-en 2.0 M Bañón, M Chichirau, M Esplà-Gomis, ML Forcada, A Galiano-Jiménez, ... Jožef Stefan Institute, 2023 | 2 | 2023 |
The sentiment corpus of parliamentary debates ParlaSent-BCS v1. 0 M Mochtak, P Rupnik, N Ljubešić Jožef Stefan Institute, 2022 | 2 | 2022 |
The twitter user dataset for discriminating between bosnian, croatian, montenegrin and serbian twitter-HBS 1.0 N Ljubešić, P Rupnik Jožef Stefan Institute, 2022 | 2 | 2022 |
Slovene Web genre identification corpus GINCO 1.0 T Kuzman, M Brglez, P Rupnik, N Ljubešić Jožef Stefan Institute, 2021 | 2 | 2021 |
Montenegrin web corpus CLASSLA-web. cnr 1.0 N Ljubešić, P Rupnik, T Kuzman Jožef Stefan Institute, 2024 | 1 | 2024 |
Macedonian web corpus CLASSLA-web. mk 1.0 N Ljubešić, P Rupnik, T Kuzman Jožef Stefan Institute, 2024 | 1* | 2024 |
The ParlaSent multilingual training dataset for sentiment identification in parliamentary proceedings M Mochtak, P Rupnik, N Ljubešić arXiv preprint arXiv:2309.09783, 2023 | 1 | 2023 |
Spoken corpus Gos 2.1 (transcriptions) D Verdonik, A Zwitter Vitez, J Zemljarič Miklavčič, S Krek, M Stabej, ... Centre for Language Resources and Technologies, University of Ljubljana, 2023 | 1 | 2023 |
Bulgarian-English parallel corpus MaCoCu-bg-en 2.0 M Bañón, M Chichirau, M Esplà-Gomis, ML Forcada, A Galiano-Jiménez, ... Jožef Stefan Institute, 2023 | 1 | 2023 |
Improving Effectiveness of a Coaching System Through Preference Learning M Znidarsic, A Osojnik, P Rupnik, B Zenko Proceedings of the 14th PErvasive Technologies Related to Assistive …, 2021 | 1 | 2021 |
" Choice of plausible alternatives" datasets in South Slavic dialects DIALECT-COPA N Ljubešić, T Kuzman, P Rupnik, S Milosavljević, N Galant, S Benčina, ... Jožef Stefan Institute, 2024 | | 2024 |