Auteurs
Gilles de Hollander, Maarten Marx
Publicatiedatum
2011/6/15
Conferentie
2011 CSI International Symposium on Computer Science and Software Engineering (CSSE)
Pagina's
54-61
Uitgever
IEEE
Beschrijving
In this study parsimonious language models were used to construct word clouds of the proceedings of the European Parliament. Multiple design choices had to be made and are discussed. Important features are stemming during tokenization, including bigrams into the word cloud and multilingualism. Also, the original parsimonious language models were extended with an additional term dampening unigrams that already occurred in the word cloud. This algorithm was tested in a small user study, using proceedings of the University of Amsterdam Science faculty's student council. Members of this council had to give their preference for multiple word clouds constructed using either parsimonious language models or simple Term Frequencies (TF) with stop words. 68% over 29% (p <;60; 0.05, two-tailed paired t-test) preferred the word clouds constructed using parsimonious language models. Beside the system …
Totaal aantal citaties
20112012201320142015201620172018201920202021202220232122112111
Scholar-artikelen
G de Hollander, M Marx - 2011 CSI International Symposium on Computer …, 2011