Authors
Dan Gillick, Cliff Brunk, Oriol Vinyals, Amarnag Subramanya
Publication date
2015/12/1
Journal
arXiv preprint arXiv:1512.00103
Description
Abstract: We describe an LSTM-based model which we call Byte-to-Span (BTS) that reads
text as bytes and outputs span annotations of the form [start, length, label] where start
positions, lengths, and labels are separate entries in our vocabulary. Because we operate
on unicode bytes rather than language-specific words or characters, we can analyze text in
many languages with a single model. Due to the small vocabulary size, these multilingual
models are very compact, but produce results similar to or better than the state-of-the-art in ...
text as bytes and outputs span annotations of the form [start, length, label] where start
positions, lengths, and labels are separate entries in our vocabulary. Because we operate
on unicode bytes rather than language-specific words or characters, we can analyze text in
many languages with a single model. Due to the small vocabulary size, these multilingual
models are very compact, but produce results similar to or better than the state-of-the-art in ...
Total citations
201610
Scholar articles
D Gillick, C Brunk, O Vinyals, A Subramanya - arXiv preprint arXiv:1512.00103, 2015
Dates and citation counts are estimated and are determined automatically by a computer program.