期刊文献+

基于《知网》的文本相似度研究 被引量:3

Research of Text Similarity Based on HowNet
下载PDF
导出
摘要 计算文本相似度常用的方法是计算以VSM表示的文本之间的夹角余弦值,但这种方法并没有考虑文本中词语之间的语义相似度.另外由于计算余弦值时要考虑VSM向量对齐,从而导致计算的高维度、高复杂性.《知网》作为一个汉语常用的知识库得到广泛的研究,利用该知识库能方便地求得汉语词语之间的相似度.利用《知网》计算每篇文本中词语之间的相似度,对VSM进行改进,用少量特征词的TF/IDF值作为改进后的VSM向量中的权重,进而计算文本之间的相似度.通过比较改进前后的VSM的维数、召回率和准确率,结果显示,改进后的算法明显降低了计算的复杂度并提高了召回率和准确率. The commonly used method of text similarity calculation is to calculate the cosine value of the vector demonstrated by VSM.However,in this method,the semantic similarity among words in a text is not considered.In addition,the VSM vector alignment should be considered during the calculation process of cosine value,which will result in high dimension and high complexity of computation.HowNet is a kind of Chinese ontology which is widely used.It is easy to calculate the similarity between two chinese words by using HowNet.In this paper,we improve the VSM.We use the TF*IDF values of a small amount of feature words as weights of the improved VSM vector,and then calculate the similarity between texts.Finally,this paper compares the dimension,the recall rate and the precision rate between both nonimproved VSM and improved VSM.The results show that the improved VSM significantly reduces the computational complexity and improves the recall rate and precision rate.
作者 袁晓峰
出处 《成都大学学报(自然科学版)》 2014年第3期251-253,共3页 Journal of Chengdu University(Natural Science Edition)
关键词 知网 语义相似度 VSM 文本相似度 HowNet semantic similarity VSM text similarity
  • 相关文献

参考文献4

二级参考文献9

共引文献129

同被引文献32

引证文献3

二级引证文献11

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部