摘要
“论文爆炸”使科研人员面临严重的信息过载问题,自动综述系统研究逐渐引起重视,而如何自动挑选出能够反映知识脉络发展的重要文献是自动综述系统需要解决的首要问题。本文从影响综述作者选择参考文献的因素出发,在引用行为中挖掘综述作者评估文献价值的规律,并基于排序学习框架构建面向自动综述系统的文献评估模型。本文以微软学术图谱为数据源构建实验数据集,通过ΔP@K和NDCG@K两个指标对实验结果进行评价。研究结果表明,①相较于单文档模型和文档列表模型,文档对模型更适合于训练最优文献评估模型,其ΔP@100、ΔP@200、NDCG@100和NDCG@200分为0.274、0.085、0.738、0.831;②知识重要性和文献质量与影响力因素对模型效果提升有较大贡献,是综述作者评估文献价值选择参考文献的首要参考因素。
The problem of information overload caused by“thesis explosion”has directed attention towards research on automatic review systems.How to automatically select important documents that can reflect the development of knowledge is the primary problem that the automatic review system needs to solve.In this study,starting from which factors influence review authors’selection of references,the rules of review authors’citation behaviors are excavated to assess the value of documents,and a document evaluation model for automatic review systems is constructed based on the ranking learning framework.This study uses Microsoft Academic Graph as the data source to construct an experimental data set and evaluates the experimental results through two indicators:ΔP@K and NDCG@K.The experimental results revealed two findings:(1)Compared with pointwise and listwise approaches,the pairwise approach is more suitable for training the optimal document evaluation model.The pairwise approach gains 0.274,0.085,0.738,and 0.831 onΔP@100,ΔP@200,NDCG@100,and NDCG@200,respectively.(2)Knowledge importance,literature quality,and influence have a greater contribution to the improvement of the model and are the primary considerations for the authors of the review article to evaluate the value of the literature and choose references.
作者
丁恒
阮靖龙
Ding Heng;Ruan Jinglong(School of Information Management,Central China Normal University,Wuhan 430079)
出处
《情报学报》
CSSCI
CSCD
北大核心
2022年第11期1199-1213,共15页
Journal of the China Society for Scientific and Technical Information
基金
国家自然科学基金青年科学基金项目“基于深度语义表示和多文档摘要的学术文献自动综述研究”(71904058)
中央高校基本科研业务费资助项目“信息交互行为与隐私保护研究”(CCNU22QN017)。
关键词
自动综述
文献评估
多维特征
排序学习
automatic review
literature evaluation
multi-dimensional features
learning to rank