Abstract:
As technology evolves, there are more and more possibilities to disseminate information
quickly and cheaply. Many online newspapers and magazines are now globally accessible
via the internet. The ease with which information is transmitted brings with it some risks
linked to “misinformation”. A clear example of this, is terrorist recruitment, especially by
the Islamic State (IS).
After years of fighting this radical movement in Europe, the problem has not yet been
solved. After the numerous attacks associated with this terrorist group, it is necessary to
develop new tools to counter its expansion. Technology is one of the key enablers of such
developments.
This thesis consists of the creation of a tool for the automatic classification of articles
or news, published in different web communication channels (mainly newspapers), against
the propagation of radical Islamist ideologies. To this end, the work has been divided into
two parts.
A first part in which two models based on machine learning and natural language pro-
cessing techniques have been developed, by means of which it is possible to carry out the
classification. Both models have been compared, in order to finally use the one that gener-
ates the best results.
The second part consisted of the implementation of a pipeline that allows the extraction
of news and articles from different web channels. Afterwards, these articles are processed
and classified (the developed model is applied). Finally, they are stored in a database and
represented in a graphical interface that allows the visualisation of different results.