Abstract:
With the increasing popularity of large scale Knowledge Graph (KG)s, many applications such as semantic analysis,
search and question answering need to link entity mentions in texts to entities in KGs. Because of the polysemy problem in
natural language, entity disambiguation is thus a key problem in current research. Existing disambiguation methods have
considered entity prominence, context similarity and entity-entity relatedness to discriminate ambiguous entities, which
are mainly working on document or paragraph level texts containing rich contextual information, and based on lexical
matching for computing context similarity. When meeting short texts containing limited contextual information, such
as web queries, questions and tweets, those conventional disambiguation methods are not good at handling single entity
mention and measuring context similarity. In order to enhance the performance of disambiguation methods based on
context similarity with such short texts, we propose SCSNED method for disambiguation based on semantic similarity
between contextual words and informative words of entities in KGs. Specially, we exploit the effectiveness of both
knowledge-based and corpus-based semantic similarity methods for entity disambiguation with SCSNED. Moreover,
we propose a Category2Vec embedding model based on joint learning of word and category embedding, in order to
compute word-category similarity for entity disambiguation. We show the effectiveness of these proposed methods with
illustrative examples, and evaluate their effectiveness in a comparative experiment for entity disambiguation in real
world web queries, questions and tweets. The experimental results have identified the effectiveness of different semantic
similarity methods, and demonstrated the improvement of semantic similarity methods in SCSNED and Category2Vec
over the conventional context similarity baseline. We further compare the proposed approaches with the state of the art
entity disambiguation systems and show the performances of the proposed approaches are among the best performing
systems. In addition, one important feature of the proposed approaches using semantic similarity, is the potential
application on any existing KGs since they mainly use common features of entity descriptions and categories. Another
contribution of the paper is an updated survey on background of entity disambiguation in KGs and semantic similarity
methods.