Development of a Named Entity Recognition System based on Ensemble Machine Learning Algorithms

Constantino Román-Gómez. (2015). Development of a Named Entity Recognition System based on Ensemble Machine Learning Algorithms. Final Career Project (TFG). ETSI Telecomunicación, Universidad Politécnica de Madrid.

Abstract:
Named Entity Recognition (NER) is a major task of Natural Language Processing. It consists in detecting and classifying phrases that clearly identify one item. NER has been researched for more than twenty years, but it still remains a big challenge. NER systems use different approaches, ranging from hand-crafted algorithms to machine learning, including supervised and unsupervised approaches. Ensemble machine learning algorithms have been researched recently. The main idea behind them is that a set of multiple classifiers can be combined to achieve better perfor- mances. Each classifier is trained sequentially. It can improve NER performance of a single classifier. In this project, we will focus on NER applied to the Twitter domain. Twitter as a domain is particularly challenging due to its shortness, lack of context, informal style and real-time nature, with new entities appearing constantly. The project will also address the challenge of adapting existing NER systems to Spanish, given that most available NER systems have been specially developed and tuned for English. Our approach will be based on the use of ensemble methods to combine several available NER systems. The project will be evaluated in a dataset of tweets in the education domain.