Abstract:
Lexical resources are widely popular in the field of Sentiment Analysis, as they
represent a resource that directly encodes sentimental knowledge. Usually sentiment lexica are used for polarity estimation through the matching of words contained in a text and their associated lexicon sentiment polarities. Nevertheless, such resources have limitations in vocabulary coverage and domain adaptation.
Besides, many recent techniques exploit the concept of distributed semantics,
normally through word embeddings. In this work, a semantic similarity metric
is computed between text words and lexica vocabulary. Using this metric, this
paper proposes a sentiment classification model that uses the semantic similarity
measure in combination with embedding representations. In order to assess the
effectiveness of this model, we perform an extensive evaluation. Experiments
show that the proposed method can improve Sentiment Analysis performance
over a strong baseline, being this improvement statistically significant. Finally,
some characteristics of the proposed technique are studied, showing that the
selection of lexicon words has an effect in cross-dataset performance.