Design and Development of a Stylometry Library for Texts in Spanish and English. Application to Terrorist and Radical Texts.

de Pablo Marsal, Á. (2019). Design and Development of a Stylometry Library for Texts in Spanish and English. Application to Terrorist and Radical Texts.. Final Career Project (TFG). Universidad Politécnica de Madrid, ETSI Telecomunicación.

Abstract:
Each person writes in a different style to any other person. This style, in addition to being intrinsically related to the same person who writes it, is also related to the scope or purpose for which it is written. The field that studies this style is called Stylometry. Stylometry is based on the study of the style through different metrics. Among others, highlight the Readability Index, Vocabulary Richness, Formality and Coherence. At the same time, these metrics can be measured and interpreted in different ways. In this project, a Python Stylometry library able to measure the style of a text has been designed and developed using different algorithms based on the previous metrics. This library allows us to analyze the style of texts written in Spanish and English, depending some of those metrics on the language due to the characteristics of each one of them (length of words, length of sentences ...). Later, for the visualization of the data, a Dashboard based on web components where you can select a text and be able to see in a clear and comfortable way how is the style of that text has been developed. This library could be used to achieve different objectives: to check that a text has a style adapted to the characteristics of the audience that is going to read it, to compare the style of two different people (politicians, writers, influential people...), to know the source of a text as well as to identify its authorship and more. In particular, this project has been focused on the use of the library for the comparison between news that talk about terrorism and statements made by terrorist groups like ETA or ISIS. Thus, if this would be possible, these radical texts could be identified and removed before its publication or consumption by Internet users and social networks users. Definitely, this Stylometry library can be used for different purposes based on the anal- ysis of the style of the texts.