期刊文献+

一种改进的基于向量空间文本相似度算法的研究与实现 被引量:35

RESEARCH AND IMPLEMENTATION OF AN IMPROVED VSM-BASED TEXT SIMILARITY ALGORITHM
下载PDF
导出
摘要 通过分析传统的基于向量空间模型(VSM)文本相似度计算算法存在的不足,提出一种改进的文本相似度计算算法。改进算法充分考虑到了文本间相同特征词对文本相似度的影响,有效减少了相似度低的文本干扰。仿真实验和系统运行结果验证了改进算法的有效性和准确性。 Aiming at the shortcoming of traditional VSM-based text similarity algorithm,an improved algorithm of text similarity is proposed in this paper.It fully takes into account the effect of same feature words between texts on the similarity of text,therefore effectively reduces the interference of the texts with lower similarity.Simulative experiment and system running results have attested the new algorithm in its effectiveness and accuracy.
出处 《计算机应用与软件》 CSCD 北大核心 2012年第2期282-284,共3页 Computer Applications and Software
关键词 向量空间 文本相似度 特征词 覆盖度 Vector space Test similarity Feature words Coverage
  • 相关文献

参考文献4

二级参考文献23

  • 1蔡雷.语料库技术在英语教学中的应用与研究[J].宿州学院学报,2008,23(5):159-161. 被引量:8
  • 2胡健,陆一鸣,马范援.基于HTML文档结构的向量空间模型的改进[J].情报学报,2005,24(4):433-437. 被引量:10
  • 3贺卫红,曹毅.基于向量空间模型文本过滤算法[J].系统工程,2005,23(10):122-125. 被引量:3
  • 4张玉芳,彭时名,吕佳.基于文本分类TFIDF方法的改进与应用[J].计算机工程,2006,32(19):76-78. 被引量:120
  • 5Nigam K,Mccallum A,Thrun S. Text classification from labeled and unla-beled documents using EM [ J ]. Machine Learning, 1999,39 (2-3) : 103-134.
  • 6邱思衡.WAP网页文本分类特征权重计算的改进[EB/OL].http://www.paper.edu.cn/,2008-11-05.
  • 7Nadav Eiron, Kevin S Mceurley. Analysis of anchor text for Web search[ C]//Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Informaion Retrieva. Toronto Canada,2003:459-460.
  • 8Xue D, Sun M. Chinese text categorization based on the binary weighting model with non-binary smoothing[ C]//Proceedings of the 25th European Conference on Information Retrieval. Pisa, Italy, 2003: 408-410.
  • 9Robertson S,Sparck-Jones K.Relevance weighting of search terms[J].Journal of American Society for Information Science,1976,3(27):129-146.
  • 10Lim H.An improved KNN learning based Korean text classifier with heuristic information[C] //proceedings of the 9th International Conference on Neural Information Processing,Singapore,Nov.2002:731-735.

共引文献13

同被引文献328

引证文献35

二级引证文献184

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部