摘要
文献相似度计算是文献检索、文献分析等应用的基础性工作,计算结果将直接影响相关应用的最终效果。文献共被引信息是其区别于普通文本的重要特征,它能有效显示文本之间的关联特征,可以充分利用该特征信息来提高文献相似度计算的有效性与可靠性。本文将文献语义特征与共被引特征引入文献相似度计算过程,在向量空间模型的基础上,提出了一种旨在优化文献相似度计算的混合模型。通过对高校图书馆、网络舆情、信息质量等七个情报学细分领域文献进行计算验证,结果显示本文提出的模型能充分利用文献特有的共被引特征,弥补向量空间模型特征量不足的问题,改善文献相似度计算的整体性能。
Calculating similarity for scientific literature is the basis of applications such as literature search and literature analysis,and the results have a direct impact on the final effectiveness of the related applications.The co-citation information is an important feature that is different from that of ordinary text.It can effectively represent the correlation between two text inputs.Further,it can be used to improve the validity and reliability of literature similarity calculation.Based on the vector space model,semantic features and co-citation features are introduced into the literature similarity calculation,and a hybrid model is proposed to optimize the similarity calculation of scientific literature.Through the verification of seven research fields,such as university library,online public opinion,and information quality,the results show that the proposed model can make full use of the co-citation features of scientific literature,and thus compensate for the problem of insufficient features in the vector space model and improve the overall performance of scientific literature similarity calculation.
作者
韩青
周晓英
Han Qing;Zhou Xiaoying(School of Information Resources Management,Renmin University of China,Beijing 100872)
出处
《情报学报》
CSSCI
CSCD
北大核心
2018年第9期905-911,共7页
Journal of the China Society for Scientific and Technical Information
基金
国家自然科学基金项目"医疗健康网站信息可信度与质量控制研究"(71473260)
关键词
文献相似度
共被引
向量空间模型
混合模型
算法优化
scientific literature similarity
co-citation
vector space model
hybrid model
algorithm optimization