Últimas noticias

The article Contextualization of a Radical Language Detection System Through Moral Values and Emotions has been recently published in the IEEE Access journal (JCR Q2 2022, 3.9 IF). The publicacion is authored by Pat ...

GSI is participating in the final conference of the project PARTICIPATION in Rome. The conference showcases the innovative and participatory methods and tools that the project has developed and tested for analysing ...

The article "Detection of the Severity Level of Depression Signs in Text Combining a Feature-Based Framework with Distributional Representations ", by Sergio Muñoz and Carlos A. Iglesias has been published in the A ...

Canal GSI

The paper GSI-UPM at SemEval-2019 Task 5: Semantic Similarity and Word Embeddings for Multilingual detection of Hate speech against Immigrants and Women on Twitter, by Diego Benito, Óscar Araque, and Carlos A. Iglesias has been published at the Thirteenth International Workshop on Semantic Evaluation (SemEval-2019).

The SemEval workshop focuses on the evaluation and comparison of systems that can alyse diverse semantic phenomena in text with the aim of extending the current state of the art in semantic analysis and creating high quality annotated datasets in a range of increasingly challenging problems in natural language semantics. In particular, SemEval-2019 task 5 aims at detecting hate speech featured by two specific different targets, immigrants and women, in a multilingual perspective, for Spanish and English.

The publication represents the first major achievement of the Intelligent Systems Group in the field of hate speech, reflected in an honorable fifth position in the Spanish sub-task A and in the development of the best European system in the same sub-task.

Abstract. This paper describes the GSI-UPM system for SemEval-2019 Task 5, which tackles multilingual detection of hate speech on Twitter. The main contribution of the paper is the use of a method based on word embeddings and semantic similarity combined with traditional paradigms, such as n-grams, TF-IDF and POS. This combination of several features is fine-tuned through ablation tests, demonstrating the usefulness of different features. While our approach outperforms baseline classifiers on different sub-tasks, the best of our submitted runs reached the 5th position on the Spanish sub-task A.

The SemEval-2019 workshop was held June 6-7, 2019 in Minneapolis, USA, collocated with the Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT 2019).