Últimas noticias

El 26/11/2021 se han entregado los premios a las mejores tesis doctorales y trabajos fin de máster otorgados por el Colegio Oficial de Ingenieros de Telecomunicación (COIT) de 2020 y 2021. En esta edición han sid ...

Participation in SALLD-1 Workshop on Sentiment Analysis & Linguistic Linked Data hold  held in conjunction with LDK 2021 – 3rd Conference on Language, Data and Knowledge in Zaragoza. The invited talk is t ...

El pasado martes 15 de junio se llevó a cabo la actividad del proyecto Gamusino con alumnos de tercero de la ESO del colegio Comunidad Infantil Villaverde. El proyecto Gamusino (GAMUSINO - Técnicas de Gamificaci ...

Canal GSI

The paper GSI-UPM at SemEval-2019 Task 5: Semantic Similarity and Word Embeddings for Multilingual detection of Hate speech against Immigrants and Women on Twitter, by Diego Benito, Óscar Araque, and Carlos A. Iglesias has been published at the Thirteenth International Workshop on Semantic Evaluation (SemEval-2019).

The SemEval workshop focuses on the evaluation and comparison of systems that can alyse diverse semantic phenomena in text with the aim of extending the current state of the art in semantic analysis and creating high quality annotated datasets in a range of increasingly challenging problems in natural language semantics. In particular, SemEval-2019 task 5 aims at detecting hate speech featured by two specific different targets, immigrants and women, in a multilingual perspective, for Spanish and English.

The publication represents the first major achievement of the Intelligent Systems Group in the field of hate speech, reflected in an honorable fifth position in the Spanish sub-task A and in the development of the best European system in the same sub-task.

Abstract. This paper describes the GSI-UPM system for SemEval-2019 Task 5, which tackles multilingual detection of hate speech on Twitter. The main contribution of the paper is the use of a method based on word embeddings and semantic similarity combined with traditional paradigms, such as n-grams, TF-IDF and POS. This combination of several features is fine-tuned through ablation tests, demonstrating the usefulness of different features. While our approach outperforms baseline classifiers on different sub-tasks, the best of our submitted runs reached the 5th position on the Spanish sub-task A.

The SemEval-2019 workshop was held June 6-7, 2019 in Minneapolis, USA, collocated with the Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT 2019).