Abstract:
This thesis collects the result of a project whose objective is to design and develop
the next elements:
A comments collection system for social networks and recommendations sites.
GSI Crawler, a website that, using the previous system, will collect and analyze
the comments from the dierent websites.
Implementation of a service to schedule, monitor and administrate the crawling
system.
It will be described the development of scrapers to collect comments. A scraper
has been developed for each website. Facebook, Twitter and YouTube oer the
necessary information through the use of a specic API. Otherwise, Amazon, Yelp
and TripAdvisor don't oer an API which we could extract the comments, therefore
a custom scraper has had to be developed to each one of these websites.
Next, the development of GSI Crawler will be described. This website is useful to
the analysis of comments from any website mentioned before. The user will choose
the type of analysis he wants to carry out (Emotions, Sentiments or Fake Analysis)
and the user will also supply, for instance, a direct URL to a Yelp's Business, the
id of a Facebook's Fan Page or a YouTube's Video. GSI Crawler will download the
comments belonging to this element and, later, the pertinent analysis will be run
using the Senpy tool. Once the analysis is nished, a summary of the result will be
shown and the possibility of review each comment one by one will be also oered.
Finally, we gather the extracted conclusions from this project, the technologies we
have learned during the development and the possible lines of future work.