摘要
论文针对油气管道领域科技信息管理中科技项目重复立项的突出问题,研究和分析管道科技项目信息相似的特性指标和要素,通过信息化技术手段,实现相似度检测,为科技立项的高质量提供保障。论文利用领域专业性特点,通过创建领域同义词词林对现有词林进行针对性的补充扩展;通过分析获取句子依存结构信息,并利用依存路径更准确刻画整体语义;在基于知网与词林结合方式计算词汇相似度的基础上,融合句子依存结构信息计算文本相似度。分别在通用文本数据集和专业领域文本数据集上进行了实验,结果表明论文方法在通用文本数据集上达到了78.64%正确率,在专业领域文本数据集上的正确率为71%。该方法应用于油气管道领域科技信息相似度检测,较好地满足了应用要求。
Aiming at the prominent problem of repeated approval of science and technology projects in the field of oil and gas pipeline science and technology information management,this paper studies and analyzes the characteristic indexes and elements of similar pipeline science and technology project information,and realizes similarity detection through information technology means to provide guarantee for high quality of science and technology project approval. In this paper,the author makes use of the specialty of the field to supplement and expand the existing field synonym thesaurus by creating the field synonym thesaurus. The information of sentence dependency structure is obtained through analysis,and the whole semantics is described more accurately by dependency path.Based on the combination of knowledge network and thesaurus,the text similarity is calculated by combining the sentence dependency structure information. Experiments are carried out on the general text data set and the specialized text data set respectively. The results show that the accuracy of the proposed method is 78.64% on the general text data set and 71% on the specialized text data set. This method is applied to the similarity detection of scientific and technological information in oil and gas pipelines,and it satisfies the application requirements well.
作者
陈泽
段友祥
CHEN Ze;DUAN Youxiang(School of Computer Science and Technology,China University of Petroleum(East China),Qingdao 266580)
出处
《计算机与数字工程》
2022年第12期2731-2736,共6页
Computer & Digital Engineering
关键词
科技信息
相似度
知网
词林
依存结构
scientific and technological information
similarity
HowNet
thesaurus
dependency structure