摘要
文章从内涵和外延两个角度研究了科技文献相似性度量问题,首先从科技文献内涵的角度在文献特征词字符匹配基础上采用泛化方法将待推荐文献关键词与当前文献关键词及其父/子关键词进行匹配;然后从外延角度结合科技文献项目的特点将文献共引因素引入文献相似性度量;最后根据关键词泛化相似度和共引关联度定义混合相似度(HS)对候选科技文献进行排序推荐,理论分析和实验数据表明,该算法能够在一定程度上避免遗漏"特征词字符不同,但语义相似"科技文献的问题。
This paper studies the similarity measurement of Scientific and Technical (S & T) literatures from the perspective of connotation and extension. The paper firstly uses the generalization method to match the keywords of the literatures to be recom- mended and the keywords of the current literatures and their father/son keywords based on the literature characteristic word string matching from the perspective of the connotation of S & T literature. Then, the paper introduces the co-citation factors of the litera- tures into the literature similarity measurement in combination with the characteristics of the S & T literatures from the perspective of extension. Finally, the paper sorts and recommends the candidate S & T literatures in accordance with the keyword generalization similarity and the Hybrid Similarity (HS) defined by the co-citation correlation. The theoretical analysis and experimental data show that the algorithm can avoid omitting the problem of "different characteristic word string with similar semantics" in S & T litera- tures.
出处
《情报理论与实践》
CSSCI
北大核心
2013年第2期96-99,103,共5页
Information Studies:Theory & Application
基金
教育部人文社会科学研究青年基金项目"科技文献推荐系统若干问题研究"(项目编号:09YJC870001)
教育部人文社会科学研究规划基金项目"云计算环境下企业数据外包服务中的用户隐私保护问题研究"(项目编号:12YJA630136)的成果
关键词
科技文献
语义关系
相似性度量
算法
S & T literature
semantic relationship
similarity measurement
algorithm