期刊文献+

基于BM25的勘察设计企业科研项目重复性检测方法研究

Research on Repeatability Detection Methods for Scientific Research Projects in Survey and Design Enterprises Based on BM25
下载PDF
导出
摘要 中国勘察设计企业的科研重复投入情况日益凸显,这是对资金、人力、信誉乃至科研精神的损耗,不利于尖端技术的孵化,因此通过智能化手段自动识别科研课题重复性,最大化复用科研成果势在必行。结合BM25算法的基础理论,融合勘察设计企业的数据属性,引入领域、专业、负责人等特征值,提出一种聚焦企业内部的科研项目重复性检测方法。该方法涉及4个步骤,包括文本预处理、建立匹配库,根据词频-逆文档频率(TF-IDF)算法、BM25算法分别计算输入课题与匹配库中课题的相似度,最后分析计算结果。相较于TF-IDF算法,BM25算法通过词语饱和度和字段长度规约实现权重控制,针对新能源、工程数字化和信息化领域的研究课题中的计算结果有较高的区分度,有利于挖掘不同领域下高相似性的文本,最大程度避免潜在重复课题的遗漏;同时该算法的计算时间小于0.1 s,可满足商用,在科研课题立项重复性校验、成果重合度判定中发挥支撑作用,计算结果经技术研发人员复验,准确性满足业务管理需要,在勘察设计行业具有推广价值。 The increasing prominence of redundant research investment in survey and design enterprises of China leads to a depletion of funds,human resources,reputation,and even the spirit of scientific research,which is detrimental to the incubation and development of cutting-edge technologies.Hence,it is imperative to automatically identify the redundancy of scientific research topics and maximize the reuse of scientific research outcomes through intelligent means.This paper proposes a method for detecting the redundancy of scientific research projects within enterprises,integrating the basic theory of the BM25 algorithm and combining the data attributes of survey and design enterprises with characteristic values such as domain,specialty,and project leaders.The method involves four steps:text preprocessing,establishing a matching library,calculating the similarity between the input topic and the topics in the matching library by using the TF-IDF algorithm and the BM25 algorithm respectively,and finally analyzing the calculation results.Compared with the TF-IDF algorithm,the BM25 algorithm realizes weight control through word saturation and field length specification,which demonstrates a distinct advantage in differentiation in the research on new energy,engineering digitalization,and informatization.It is more useful to mine texts with high similarity in different fields and avoid the omission of potential duplicate topics to the greatest extent.In the meantime,with a computation time of less than 0.1 seconds,it meets commercial needs,and supports the verification of redundancy in research topic initiation and the determination of overlap in outcomes.The accuracy of the calculation results has been verified by technical research and development personnel,meeting the needs of business management and holds promotional value in the survey and design industry.
作者 王扬 曹德威 王剑刚 钱锋 钱常运 Wang Yang;Cao Dewei;Wang Jiangang;Qian Feng;Qian Changyun(Shanghai Investigation,Design&Research Institute Co.,Ltd.,Shanghai 200335,China)
出处 《科技管理研究》 CSSCI 2024年第4期167-174,共8页 Science and Technology Management Research
关键词 科研课题 项目重复性校验 勘察设计企业 BM25 词频-逆文档频率(TF-IDF) 文本相似度 scientific research project project redundancy verification survey and design enterprises BM25 TF-IDF text similarity
  • 相关文献

参考文献8

二级参考文献80

共引文献83

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部