Abstract:
Event detection has been a field of research long before social networks reached the high
impact they have nowadays. Events were tracked from traditional news web sites, blogs or
other information channels. However when microblogging as a form of social media emerged
all this landscape changed.
In this project we have developed a system capable of detecting the most important
events occurred in a city by analyzing data published on social networks. For this, we have
adapted and improved an already existing clustering approach named MABED, which relies
on the number of interactions between users to measure the impact. Our main contributions
to this model has been to improve that impact algorithm accuracy and to provide a new
definition of redundancy leading to a better performance on duplicated events.
The social network our detector reads is Twitter, considered a valuable source of what is
known as Social Data. Information is provided by short length documents posted by users,
called tweets. These publications are collected from our Streamer, gathering posts that have
just been published in the city of Madrid.
In addition to the cluster we have also developed an architecture that turns our project
into a system. Streamer is in charge of collecting the data that we feed to our detector.
However it first needs to pass through a preprocessing module which filters spam out and
lemmatizes the text in order to achieve a better performance. Once the detection task is
finished results are saved in a persistence subsystem. These results are finally visualized
in a dashboard which interacts with the user and facilitates the cognitive process of the
performed analysis. All this data flow is supervised by an orchestrator which assures the
correct interaction between modules.
The process we have just explained is repeated periodically every half an hour showing
top three events with the higher impact that took place in the city of Madrid in the last 24
hours.
Key