期刊文献+

基于语义和TF-IDF的项目相似度计算方法 被引量:8

Project similarity algorithm based on semantic and TF-IDF
下载PDF
导出
摘要 基于统计的TF-IDF相似度计算方法由于不考虑词语的语义信息,不能准确地反映文本间的相似性。针对该问题,提出一种结合语义理解和TF-IDF的科技项目相似度计算方法。在项目分词的基础上,利用《知网》计算两个项目间的特征项语义相似度,基于TF-IDF计算每个特征项的权重,然后针对权重大于给定阈值的特征项进行加权进而计算得到项目相似度值。实验结果表明,该方法效果优于单纯的TF-IDF和语义理解的方法。 TF-IDF(term frequency - inverse document frequency)is one of the traditional text similarity calculation method based on statistics. Because TF-IDF does not consider the semantic information of words, it can not accurately reflect the similarity between texts. Aiming at this problem, this paper advances a method combined with the semantic tmderstanding and TF-IDF to calculate the similarity of technology project. Based on the word segmentation of the technology project and the information from the HowNet, calculates the feature semantic similarity of the two between, then calculates weight of each feature by using TF-IDF, and finally calculates the similarity value of the technology project according to the weight of the features that their weight is greater than the given threshold. The experimental results show that the method is better than the pure TF-IDF and the method of semantic understanding.
作者 赵士杰 陈秋
出处 《计算机时代》 2015年第5期1-3,6,共4页 Computer Era
基金 2013年浙江省公益技术应用研究项目"基于语义的科技项目查重研究与实现"(2013C33G2040027)2013-2014
关键词 语义理解 《知网》 特征项权重 相似度计算 TF-IDF semantic understanding HowNet weight of feature similarity calculation
  • 相关文献

参考文献7

二级参考文献45

共引文献207

同被引文献54

  • 1潘文慧,赵捧未,丁献峰.科研项目负责人网络位置对项目创新的影响[J].科研管理,2021,42(5):207-217. 被引量:8
  • 2刘薇.区块链智能合约的法律性质[J].法治论坛,2020(2):69-81. 被引量:8
  • 3张玉芳,彭时名,吕佳.基于文本分类TFIDF方法的改进与应用[J].计算机工程,2006,32(19):76-78. 被引量:121
  • 4黄昌宁,赵海.中文分词十年回顾[J].中文信息学报,2007,21(3):8-19. 被引量:249
  • 5左川基于非分词技术的科技项目查重研究与实现[D].重庆:重庆大学,2010.
  • 6Sahon G, Wong A, Yang C S. A vector space model for automatic indexing [ J ]. Communications of the ACM, 1975,18( 11 ) :613-620.
  • 7SebtI A ,Barfrous A A. A new word sense similarity measure in Word-Net[ C]//Proceedings of the International Multi- conference on Computer Science and Information Technnology, Washington D C:.IEEE Computer Society, 2008:369-373.
  • 8MyeSohn, Jun HyeokYim, Seongil Lee, Hyun Jung Lee. Ontology-based dynamic and semantic similarity calculation method for case-based reasoning [ J ]. Intelligent Automation & Soft Computing, 2014,20( 1 ):33-46.
  • 9FaisalRahutomo,Masayoshi Aritsugi. Econo-ESA in semantic text similarity[ J ]. SpringerPlus ,2014,3 ( I ) : 1-13.
  • 10重点基础研发计划申请[EB/OL].[2016-05-24].http://service.most.gov.cn/.NSFC[EB/OL].[2016-05-24].http://www.nsfc.gov.ca.

引证文献8

二级引证文献24

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部