摘要
语义相似度计算在自然语言处理领域有着非常重要的作用,近年来随着深度学习技术的兴起,利用词向量的进行语义相似度计算的技术得到广泛应用.人们提出了许多计算词向量的模型和方法,但这些模型中一个词仅对应一个词向量,而自然语言中存在着大量的多义词,因此这些模型不能很好的表示多义词语义特征.本文提出一种多义词词向量计算方法,引入主题模型对多义词进行语义标注,将标注后的词语视为新词进行词向量计算,可得到一个多义词的多个词向量.在中英文两种语料上进行了实验,实验结果表明,该方法能准确计算出多义词不同含义的词向量,语义相似度计算的准确性明显提高.
Semantic similarity calculation plays a very important role in the area of natural language processing. In recent years,with the development of Deep Learning,the technology that using the word embedding to compute the semantic similarity has been widely used. At the same time,a lot of models that computing word embedding have been proposed,and these models correspond one word to a single word embedding. But there are many polysemous words in natural language processing,so these models cannot capture the characteristics of those words properly. We propose a polysemous word embedding calculation model that combines topic model and normal word embedding calculation model. First,we use topic model to do semantic annotation on the corpus,then we regard the annotation words as a newword and proceed normal word embedding calculation method on the corpus,finally we get multi word embedding for a polysemous word. We conduct our experiment on both Chinese and English corpus,the results of our experiment showthat our model can get multi word embedding for polysemous words and the semantic similarity calculation accuracy has been improved significantly.
出处
《小型微型计算机系统》
CSCD
北大核心
2016年第7期1417-1421,共5页
Journal of Chinese Computer Systems
基金
数学工程与先进计算国家重点实验室开放基金面上项目(2013A02)资助
关键词
词向量
多义词
主题模型
语义相似度
word embedding
polysemous words
topic model
semantic similarity