摘要
词向量(Word2Vec)是近些年来自然语言处理领域的重要算法,在近几年的人工智能发展中占有极其重要的地位。通过向量空间的形式对每一个词进行标志,进而在概率方面上对词进行表示。Word Mover Distance算法是Earth Mover Distance的一个特殊形式,用来计算一组向量之间最短距离。文章使用上述两个算法作为基底,对词向量进行相关的空间映射预处理操作,作为WMD(word mover distance)的输入参数,最终可以得到句子间相似度得分。实验表明,该方法使相似语句与不相关语句之间的距离差距更大,且在专家系统中相似问句之间的距离更加紧密,更能显著刻画句子之间的语义相似程度,有利于增加短文本匹配的准确度。
Word vector (Word2Vec) is an important algorithm in the field of natural language processing in recent years. Accordingly, it has a very important position in the development of artificial intelligence. It can express each word in the form of vector space, and it can express the word in terms of probability. The Word Mover Distance algorithm is a special form of the Earth Mover Distance used to calculate the shortest distance between a set of vectors. Based on the above two algorithms, the article uses the word vector for the relevant spatial mapping preprocessing operation to get word mover distance (WMD) input parameters, and ultimately the similarity score of two words could be obtained. Experiments show that the method can make the distance between the similar statements and the distance between the irrelevant statements greater, and the distance between the similar question sentences in the expert system closer. Thus, the semantics between sentences similarity is more clearly described, so as to increase the accuracy of short text matching.
作者
乔猛
刘慧君
梁光辉
QIAO Meng;LIU Huijun;LIANG Guanghui(Institute of Computer, Chongqing University, Chongqing 400000, China;Information Engineering University, Zhengzhou 450001, China)
出处
《信息工程大学学报》
2018年第4期447-452,共6页
Journal of Information Engineering University
基金
国家自然科学基金资助项目(61572518)
关键词
专家系统
词向量
WMD
空间映射
相似度计算
expert system
word vector
WMD
spatial mapping
similarity calculation