Abstract:
Named Entity Recognition (NER) is a major task of Natural Language Processing. It
consists in detecting and classifying phrases that clearly identify one item. NER has been
researched for more than twenty years, but it still remains a big challenge.
NER systems use different approaches, ranging from hand-crafted algorithms to machine
learning, including supervised and unsupervised approaches.
Ensemble machine learning algorithms have been researched recently. The main idea
behind them is that a set of multiple classifiers can be combined to achieve better perfor-
mances. Each classifier is trained sequentially. It can improve NER performance of a single
classifier.
In this project, we will focus on NER applied to the Twitter domain. Twitter as a
domain is particularly challenging due to its shortness, lack of context, informal style and
real-time nature, with new entities appearing constantly.
The project will also address the challenge of adapting existing NER systems to Spanish,
given that most available NER systems have been specially developed and tuned for English.
Our approach will be based on the use of ensemble methods to combine several available
NER systems. The project will be evaluated in a dataset of tweets in the education domain.