Skip to yearly menu bar Skip to main content

Poster Presentation
Workshop: 2nd Workshop on Practical ML for Developing Countries: Learning Under Limited/low Resource Scenarios

ANEA: Distant Supervision for Low-Resource Named Entity Recognition

Michael Hedderich


Distant supervision allows obtaining labeled training corpora for low-resource settings where only limited hand-annotated data exists. However, to be used effectively, the distant supervision must be easy to gather. In this work, we present ANEA, a tool to automatically annotate named entities in text based on entity lists. It spans the whole pipeline from obtaining the lists to analyzing the errors of the distant supervision. A tuning step allows the user to improve the automatic annotation with their linguistic insights without labelling or checking all tokens manually. In six low-resource scenarios, we show that the F1-score can be increased by on average 18 points through distantly supervised data obtained by ANEA.