摘要
探讨相关文献三种实现机制的特点,构建更有效的中文科技文献相关性数据库。借鉴完全内容特征算法,基于词表的分类向量空间模型进行预处理相关文献,并以冶金工业领域为例构建中文科技文献相关性数据库。通过结合系统判定和人工判定结果的对比分析、系统和系统之间的判定结果的对比,分析了基于词表的分类向量空间模型的相关性判定效果,结果表明其具有较高的准确率。基于完全内容特征算法判定相关文献有利于完善知识发现系统功能,提高知识服务水平。
In this paper, the characteristics of three mechanisms are explored to build a more effective database of Chinese literature of science and technology. Drawing upon complete content algorithm, and using Vector Space Model(VSM) based on thesaurus, the relevant literature is preprocessed. Then taking the metallurgy industry as an example, it constructs database of Chinese literature of science and technology. By comparing the system judgment and artificial judgment of relevance, the system judgment and the other two systems'.judgment of relevance, the VSM of classification based on thesaurus is evaluated to have high accuracy. The judgment of related articles based on complete content feature algorithm is conducive to improving the function of knowledge discovery system and to improving the knowledge service level.
出处
《图书馆杂志》
CSSCI
北大核心
2016年第12期32-40,共9页
Library Journal
关键词
分类向量空间模型
分类-SIM
词表分词
相关性判定
相关文献
VSM of Classification, Classification-SIM, Chinese word segmentation with thesaurus, Judgment of Relevance, Related articles