Development of a Genre Classification System for Twitter based on Machine learning techniques

Carlos Fuentes. (2017). Development of a Genre Classification System for Twitter based on Machine learning techniques. Trabajo Fin de Titulación. ETSI Telecomunicación, Universidad Politécnica de Madrid.

Abstract:
In the last years, Internet has been introducing progressively in people’s lives. It contains huge amounts of information, and accessing it, has become something that people do in the day-to day. A big part of all that information comes from social networks, such as Twitter or Facebook. With the exponential growth in the number of users in those networks, the demand of the characterization of the attributes of those users also grows. Attributes such as gender, age, political beliefs, etc can be very useful for the users and for the big companies and their social analysis. This thesis is the result of a project whose objective has been to deploy and develop a classifying system that predicts the gender of the user of a Twitter account. To do so, a plug-in has been developed in the platform Senpy, that allows the execution of our system as a service. The programming language that was used for the implementation has been Python. For the development, Supervised Machine Learning techniques had been necessary, as well as Natural Language Processing (NLP) tools. This is due to the necessity of the system of obtaining and storing data from the Internet for its posterior process. In order to test and evaluate the different modules of this system, a set of data composed by Tweets from different users had been used. They are in two different languages: Spanish and English.