Abstract:
Each person writes in a different style to any other person. This style, in addition to being
intrinsically related to the same person who writes it, is also related to the scope or purpose
for which it is written. The field that studies this style is called Stylometry.
Stylometry is based on the study of the style through different metrics. Among others,
highlight the Readability Index, Vocabulary Richness, Formality and Coherence. At the
same time, these metrics can be measured and interpreted in different ways.
In this project, a Python Stylometry library able to measure the style of a text has been
designed and developed using different algorithms based on the previous metrics. This
library allows us to analyze the style of texts written in Spanish and English, depending
some of those metrics on the language due to the characteristics of each one of them (length
of words, length of sentences ...).
Later, for the visualization of the data, a Dashboard based on web components where
you can select a text and be able to see in a clear and comfortable way how is the style of
that text has been developed.
This library could be used to achieve different objectives: to check that a text has a
style adapted to the characteristics of the audience that is going to read it, to compare the
style of two different people (politicians, writers, influential people...), to know the source
of a text as well as to identify its authorship and more.
In particular, this project has been focused on the use of the library for the comparison
between news that talk about terrorism and statements made by terrorist groups like ETA
or ISIS. Thus, if this would be possible, these radical texts could be identified and removed
before its publication or consumption by Internet users and social networks users.
Definitely, this Stylometry library can be used for different purposes based on the anal-
ysis of the style of the texts.