期刊文献+

基于词性和关键词的短文本相似度计算方法 被引量:2

Optimizing Word Mover's Distance Algorithm with Text Rank
下载PDF
导出
摘要 Word Mover’s Distance(WMD)是近年来非常热门的一个计算文本距离的算法,可以较为准确地进行文本相似度测量,被广泛应用于舆情分析,内容分类等。在WMD算法中,最重要的是将词进行词袋化处理,得到300维度的词向量,由于在得到词向量时,词的权重是随机分配的,所以最终得到的相似文本内容正确率不稳定。文章在WMD算法基础上,提取关键词,结合词性分类,给不同词性的词语分配不同的权重,从而进一步优化WMD算法,提高分类的准确率。 Word Mover's Distance is a very popular algorithm in recent years. This algorithm provides a new way to calculate the distance between words and words, so it can be applied in natural language processing such as public opinion processing and social media classification. In the WMD algorithm, the most important thing is to word-pack the words to get the word vectors of300 dimensions. Since the weight of the words is randomly assigned when the word vectors are obtained, the accuracy of the resulting similar text contents is not stable. Based on the WMD algorithm, this dissertation extracts keywords and combines part-of-speech classification to assign different weights to terms of different parts of speech to further optimize the WMD algorithm and improve the classification accuracy.
作者 赵明月 Zhao Mingyue(School of Computer and Information Engineering, Henan University, Kaifeng, Henan475004, Chin)
出处 《计算机时代》 2018年第5期66-70,73,共6页 Computer Era
关键词 词性分类 权重 提取关键词 相似度 part-of-speech classification weight extract keyword similarity
  • 相关文献

同被引文献15

引证文献2

二级引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部