摘要
[目的/意义]大数据时代,科学数据成为科研产出的重要基石。阅读科技文献时候查看相关科学数据,浏览科学数据时候找出相关文章,越来越成为现代科研工作的需求。科技文献与科学数据关联服务成为近年来广受关注的科研成果组织服务形式。[方法/过程]文章系统地梳理了科技文献和科学数据关联方法,将其概括为基于元数据、基于内容和基于引文结构等几类,设计了一种基于引文探针的关联算法,在高能物理领域文献与粒子关联场景下实现该算法,并探讨了该算法的覆盖性和准确性。[结果/结论]基于引文探针的关联推断方法,能通过对关联度的计算,发现更多的隐含关联,提高关联的覆盖率,特别是对新文献能够及时计算出关联结果,减少人工标注的工作量。
[Purpose/significance]In the era of big data,scientific data become an important cornerstone of scientific output.When we read science literature,we want to look at relevant scientific data.When we browse scientific data,we want to find out relevant articles.The association services of scientific literature and scientific data have attracted wide attention in recent years.[Method/process]In this paper,the methods of association between scientific literature and scientific data are summarized as metadata-based,content-based and citation-based methods.We design a method of association between scientific literature and scientific data based on the citation probe,and implement the algorithm in the field of high energy physics.The coverage and accuracy of the algorithm are also discussed.[Result/conclusion]The method of association inference based on citation probe can discover more implied correlations and improve the coverage of correlations.Especially,it can quickly calculate the results of new documents and reduce the manual labeling work.
出处
《情报理论与实践》
CSSCI
北大核心
2019年第10期151-156,共6页
Information Studies:Theory & Application
基金
中国科学院文献情报能力建设专项“高能物理领域科学文献与科学数据关联融汇——以LHAASO项目为示范”和中国科学院“西部之光”青年学者项目“面向知识发现的科学数据与科技文献关联融汇系统研究与示范”的研究成果
关键词
科技文献
科学数据
引文探针
数据关联
应用分析
scientific and technical literature
scientific data
citation probe
data association
application analysis