摘要
在学术文献检索中,如果检索系统根据用户提交的查询返回相关领域的文献,并将文献按重要程度进行排序,可以帮助用户快速了解相关学术领域.提出一个面向领域的学术文献检索框架,结合引用网络分析和内容分析来发现并排序相关领域重要文献.该框架设计了一个评分函数进行检索,包含两个方面:(1)论文在所查询领域的重要性;(2)论文与该领域的相关性.首先研究了一个"社区核"发现算法,从引用网络上发现和查询领域相关的一个文献子集,并对论文计算重要性评分.设计了一种有监督非负矩阵分解算法,该算法使用确定的领域相关文献为先验知识对其他论文进行分类并给出一个评分,以确定论文和查询学术领域的相关性.在真实数据集和合成数据集上的实验,证实了方法的有效性.
A literature retrieval system, which returns user papers domain-related with queries and ranks papers by importance, can help users quickly learn one academic domain. This paper develops a framework for the domain-oriented literature retrieval, which combines links and contents analysis to search and rank important papers in one academic domain. This framework designs a score function that evaluates both importance of the paper and its relevance to the domain. The study first proposes a community-core discovery algorithm, which is capable of finding a collection of papers domain-related with query from citation network and calculates an importance score for each paper. To assign other papers a domain-related score, a supervised non-negative matrix factorization method, using identified domain-related paper as prior knowledge, is also developed. The experiments conducted on synthetic and real datasets demonstrate the feasibility and applicability of this framework.
出处
《软件学报》
EI
CSCD
北大核心
2013年第4期798-809,共12页
Journal of Software
基金
国家自然科学基金(61170133)
国家教育部人文社科青年基金(09YJCZH101)
中央高校基本科研业务费(JBK120214)
西南财经大学211工程青年教师成长项目(211QN10061)
关键词
非负矩阵分解
随机游走
文献检索
引用网络
链接分析
non-negative matrix factorization
random walk
literature retrieval
citation network
link analysis