摘要
【目的】设计一款具有SCI/EI数据库文献数据查重和数据融合功能的软件。【应用背景】帮助分析人员获得来自SCI/EI数据库的文献融合数据集,更好地满足微观学科情报分析对灵活构建多来源期刊文献数据集的需求。【方法】利用两种自动算法和一种半自动算法实现SCI/EI文献数据的准确查重,在对两者的全记录字段进行深入微观文本分析的基础上实现数据融合。【结果】可自动标记SCI/EI文献数据的重复记录并生成查重后的融合数据表。【结论】有效解决两个不同期刊文献数据源的统一分析数据集构建问题。
[Objective] A software is designed to implement duplication checking and data fusion of the papers indexed by SCI and by EI. [Context] The software can help paper analysts obtain a dataset in the same format and meet demand of micro-analysis of subject information. [Methods] Two automatic algorithms and one semi-automatic algorithm are used to complete accurate data duplicate checking on the papers indexed by SCI and EI. Data fusion is based on detailed analysis of text features of data fields of SCI and El. [Results] It can mark papers which are duplicated between SCI papers and EI papers and create a de-duplicated data fusion sheet. [Conclusions] The construction problem of the dataset from different data sources is solved effectively and its design ideas also can be applied to other databases.
出处
《现代图书情报技术》
CSSCI
北大核心
2014年第11期79-87,共9页
New Technology of Library and Information Service
基金
中国科学院文献情报中心青年人才领域前沿项目"学科化知识服务辅助工具优化设计"(项目编号:青1209)的研究成果之一
关键词
查重
融合
EI
SCI
软件设计
Duplicate checking Data fusion El SCI Software design