Abstract:
Twitter is a social network that allows its users to exchange messages of 280 characters with the possibility of accompanying them with a photo, video and/or link. This social network has been used as a source of data for numerous research studies on the human being. This study aims to analyse and characterize the messages coming from Spanish-speaking users related to the most common sleep disorder in our society, insomnia. For this purpose, this study provides two machine learning classifiers that enable the classification of users with insomnia together with the self-reported cause. In this context, this paper proposes a novel feature extraction method that exploits the similarity measure that can be computed in word embeddings models. For training these classifiers, a dataset of tweets in Spanish containing the word insomnia has been manually annotated to draw conclusions about the geographical distribution, symptoms and the different topics that users with insomnia treat. In addition, a second dataset has been collected formed by two groups of users from Spain with insomnia and without insomnia. Analysing the timeline of both groups we have been able to extract the differences in the patterns of activity on Twitter of each of these groups.