摘要
本文结合开放获取期刊(Open Access Journal,OA期刊)资源特点,针对无法通过OAI-PMH协议进行资源采集的OA期刊,提出一种基于网页信息抽取的资源采集策略。本文从网页资源描述的角度总结OA期刊资源特点并对其分类。基于网页信息抽取方法在OA期刊资源采集适用性,提出一种基于OA期刊网页元数据抽取的采集方法,并在此方法的基础上设计了采集系统。通过对国内外不遵循OAI-PMH协议的10本OA期刊的网站实证采集,得到45 785篇论文的元数据,证明该采集方法能有效地应用于此类资源采集。研究丰富了OA期刊资源采集方式,对不遵循OAI-PMH协议的OA期刊资源采集提供方法借鉴。
Open access journal resources have important academic value,however,some open access journals do not follow the OAI-PMH protocol,and can not collect resources through OAI-PMH protocol.In this paper,based on the characteristics of open Access journal resources,we propose a non OAI-PMH protocol based open access resource acquisition strategy.In this paper,from the point of view of web resources description,this paper summarizes the characteristics of open access journal resources and classifies them from the point of view of web resources description.Based on the applicability of the web information collection method in collecting open access journal resources,this paper proposes a open access journal resource acquisition strategy non based on OAI-PMH protocol,which is based on the method of acquisition open access journal web metadata extraction and design the acquisition system.Through the empirical study of10open access journals which do not provide the OAI-PMH protocol at home and abroad,a total of45785papers were collected.It is proved that this method can be effectively applied to the acquisition of such resources.The research enriches the acquisition methods of open access journals,and provides a method to guide the acquisition of open access journals that do not follow the OAI-PMH protocol.
作者
黄政
张学福
HUANG Zheng;ZHANG XueFu(Agricultural Information Institute of Chinese Academy of Agricultural Sciences, Beijing 100081, China)
出处
《数字图书馆论坛》
CSSCI
2017年第5期25-32,共8页
Digital Library Forum
关键词
OA期刊
OA期刊资源采集
网页信息采集
OA期刊资源采集系统
Open Access Journal
Open Access Journal Resource Acquisition
Web Information Acquisition
Open Access Journal Resource Acquisition System