The article "To Click It or Not to Click It: An Italian Dataset for Neutralising Clickbait Headlines" has been presented at the Tenth Italian Conference on Computational Linguistics (CLiC-it 2024). The publication is authored by Daniel Russo, Oscar Araque, and Marco Guerini. The article has received the Best Student Paper Award at the conference.
Link to AILC post.
Abstract:
Clickbait is a common technique aimed at attracting a reader’s attention, although it can result in inaccuracies and lead to misinformation. This work explores the role of current Natural Language Processing methods to reduce its negative impact. To do so, a novel Italian dataset is generated, containing manual annotations for classification, spoiling, and neutralisation of clickbait. Besides, several experimental evaluations are performed, assessing the performance of current language models. On the one hand, we evaluate the performance in the task of clickbait detection in a multilingual setting, showing that augmenting the data with English instances largely improves overall performance. On the other hand, the generation tasks of clickbait spoiling and neutralisation are explored. The latter is a novel task, designed to increase the informativeness of a headline, thus removing the information gap. This work opens a new research avenue that has been largely uncharted in the Italian language.