Trafilatura: A web scraping library and command-line tool for text discovery and extraction A Barbaresi Proceedings of the 59th Annual Meeting of the Association for Computational …, 2021 | 60 | 2021 |
The good, the bad, and the hazy: Design decisions in web corpus construction R Schäfer, A Barbaresi, F Bildhauer 8th Web as Corpus Workshop, pp. 7-15, 2013 | 47 | 2013 |
Die Korpusplattform des „Digitalen Wörterbuchs der deutschen Sprache “(DWDS) A Geyken, A Barbaresi, J Didakowski, B Jurish, F Wiegand, L Lemnitzer Zeitschrift für germanistische Linguistik 45 (2), 327-344, 2017 | 46 | 2017 |
For a fistful of blogs: Discovery and comparative benchmarking of republishable German content A Barbaresi, KM Würzner KONVENS 2014, NLP4CMC workshop, 2-10, 2014 | 31 | 2014 |
Ad hoc and general-purpose corpus construction from web sources A Barbaresi ENS Lyon, 2015 | 28 | 2015 |
Efficient construction of metadata-enhanced web corpora A Barbaresi 10th Web as Corpus Workshop, 7-16, 2016 | 26 | 2016 |
Focused web corpus crawling R Schäfer, A Barbaresi, F Bildhauer 9th Web as Corpus Workshop (WaC-9), 9-15, 2014 | 24 | 2014 |
A corpus of German political speeches from the 21st century A Barbaresi 11th Language Resources and Evaluation Conference (LREC 2018), 792-797, 2018 | 22 | 2018 |
Collection and indexing of tweets with a geographical focus A Barbaresi Tenth International Conference on Language Resources and Evaluation (LREC …, 2016 | 20 | 2016 |
Finding viable seed URLs for web corpora: a scouting approach and comparative study of available sources A Barbaresi 9th Web as Corpus Workshop (WaC-9), 14th Conference of the European Chapter …, 2014 | 17 | 2014 |
Out-of-the-box and into the ditch? multilingual evaluation of generic text extraction tools A Barbaresi, G Lejeune Language Resources and Evaluation Conference (LREC 2020), 5-13, 2020 | 16 | 2020 |
Generic web content extraction with open-source software A Barbaresi KONVENS 2019, 267-268, 2019 | 16 | 2019 |
Diminutivvariation in österreichischen elektronischen Korpora S Schwaiger, A Barbaresi, K Korecky-Kröll, J Ransmayr, W Dressler Lars Bülow, Ann Kathrin Fischer & Kristina Herbert (Hrsg.), Dimensionen des …, 2019 | 16 | 2019 |
An unsupervised morphological criterion for discriminating similar languages A Barbaresi 3rd Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial …, 2016 | 16 | 2016 |
Language-classified Open Subtitles (LACLOS): download, extraction, and quality assessment A Barbaresi BBAW, 2014 | 16 | 2014 |
Crawling microblogging services to gather language-classified URLs. Workflow and case study A Barbaresi Annual Meeting of the Association for Computational Linguistics, 9-15, 2013 | 15 | 2013 |
The Vast and the Focused: On the need for thematic web and blog corpora A Barbaresi 7th Workshop on Challenges in the Management of Large Corpora (CMLC-7), 29-32, 2019 | 11 | 2019 |
Challenges in web corpus construction for low-resource languages in a post-BootCaT world A Barbaresi 6th Language & Technology Conference, Less Resourced Languages special track …, 2013 | 11 | 2013 |
German Political Speeches-Corpus and Visualization A Barbaresi DGfS-CL poster session, 2012 | 11 | 2012 |
A constellation and a rhizome: two studies on toponyms in literary texts A Barbaresi VISUALISIERUNG, 167, 2018 | 10 | 2018 |