Publicación - To Click It or Not to Click It: An Italian Dataset for Neutralising Clickbait Headlines

To Click It or Not to Click It: An Italian Dataset for Neutralising Clickbait Headlines

Daniel Russo, Oscar Araque & Marco Guerini (2024). To Click It or Not to Click It: An Italian Dataset for Neutralising Clickbait Headlines. In Proceedings of the Tenth Italian Conference on Computational Linguistics (CLiC-it 2024).

Abstract:

Clickbait is a common technique aimed at attracting a reader’s attention, although it can result in inaccuracies and lead to misinformation. This work explores the role of current Natural Language Processing methods to reduce its negative impact. To do so, a novel Italian dataset is generated, containing manual annotations for classification, spoiling, and neutralisation of clickbait. Besides, several experimental evaluations are performed, assessing the performance of current language models. On the one hand, we evaluate the performance in the task of clickbait detection in a multilingual setting, showing that augmenting the data with English instances largely improves overall performance. On the other hand, the generation tasks of clickbait spoiling and neutralisation are explored. The latter is a novel task, designed to increase the informativeness of a headline, thus removing the information gap. This work opens a new research avenue that has been largely uncharted in the Italian language.