Follow
Adrien Barbaresi
Adrien Barbaresi
Berlin-Brandenburg Academy of Sciences
Verified email at bbaw.de - Homepage
Title
Cited by
Cited by
Year
Trafilatura: A web scraping library and command-line tool for text discovery and extraction
A Barbaresi
Proceedings of the 59th Annual Meeting of the Association for Computational …, 2021
572021
The good, the bad, and the hazy: Design decisions in web corpus construction
R Schäfer, A Barbaresi, F Bildhauer
8th Web as Corpus Workshop, pp. 7-15, 2013
472013
Die Korpusplattform des „Digitalen Wörterbuchs der deutschen Sprache “(DWDS)
A Geyken, A Barbaresi, J Didakowski, B Jurish, F Wiegand, L Lemnitzer
Zeitschrift für germanistische Linguistik 45 (2), 327-344, 2017
462017
For a fistful of blogs: Discovery and comparative benchmarking of republishable German content
A Barbaresi, KM Würzner
KONVENS 2014, NLP4CMC workshop, 2-10, 2014
312014
Ad hoc and general-purpose corpus construction from web sources
A Barbaresi
ENS Lyon, 2015
282015
Efficient construction of metadata-enhanced web corpora
A Barbaresi
10th Web as Corpus Workshop, 7-16, 2016
262016
Focused web corpus crawling
R Schäfer, A Barbaresi, F Bildhauer
9th Web as Corpus Workshop (WaC-9), 9-15, 2014
242014
A corpus of German political speeches from the 21st century
A Barbaresi
11th Language Resources and Evaluation Conference (LREC 2018), 792-797, 2018
212018
Collection and indexing of tweets with a geographical focus
A Barbaresi
Tenth International Conference on Language Resources and Evaluation (LREC …, 2016
202016
Finding viable seed URLs for web corpora: a scouting approach and comparative study of available sources
A Barbaresi
9th Web as Corpus Workshop (WaC-9), 14th Conference of the European Chapter …, 2014
172014
Out-of-the-box and into the ditch? multilingual evaluation of generic text extraction tools
A Barbaresi, G Lejeune
Language Resources and Evaluation Conference (LREC 2020), 5-13, 2020
162020
Generic web content extraction with open-source software
A Barbaresi
KONVENS 2019, 267-268, 2019
162019
An unsupervised morphological criterion for discriminating similar languages
A Barbaresi
3rd Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial …, 2016
162016
Language-classified Open Subtitles (LACLOS): download, extraction, and quality assessment
A Barbaresi
BBAW, 2014
162014
Crawling microblogging services to gather language-classified URLs. Workflow and case study
A Barbaresi
Annual Meeting of the Association for Computational Linguistics, 9-15, 2013
152013
Diminutivvariation in österreichischen elektronischen Korpora
S Schwaiger, A Barbaresi, K Korecky-Kröll, J Ransmayr, W Dressler
Lars Bülow, Ann Kathrin Fischer & Kristina Herbert (Hrsg.), Dimensionen des …, 2019
142019
The Vast and the Focused: On the need for thematic web and blog corpora
A Barbaresi
7th Workshop on Challenges in the Management of Large Corpora (CMLC-7), 29-32, 2019
112019
German Political Speeches-Corpus and Visualization
A Barbaresi
DGfS-CL poster session, 2012
112012
A constellation and a rhizome: two studies on toponyms in literary texts
A Barbaresi
VISUALISIERUNG, 167, 2018
102018
Discriminating between similar languages using weighted subword features
A Barbaresi
Fourth Workshop on NLP for Similar Languages, Varieties and Dialects …, 2017
102017
The system can't perform the operation now. Try again later.
Articles 1–20