期刊文献+

机构知识库自动存储系统研究 被引量:2

Research on Automatic Archiving System for Institutional Repositories
原文传递
导出
摘要 介绍一种从网络文献数据库中自动采集机构学术成果并存储到DSpace平台的实验系统(DAAS),并实现信息过滤、元数据提取、版权验证、元数据映射和数据存储的半自动化流程。详细描述基于Nutch核心组件,DAAS针对不同的期刊数据库,采用基于规则的方法设置过滤器来提取非结构化网页上书目信息,并指出计算机学习算法是下一步研究重点。 This paper introduces an experimental system(DAAS) which can automatic harvest the institutional researcher articles and ingest the metadata into the local DSpace platform.The system implements a semi-automatic approach for IRs population which consists of information filtering,metadata extraction,copyright verification,metadata mapping and data archiving.Based on Nutch key component,how to parse the URL and extract the metadata from unstructured Web pages according to the rule-based filter is described in detail.The next research is focus on the computer-learning algorithm.
作者 崔宇红
出处 《现代图书情报技术》 CSSCI 北大核心 2010年第12期76-80,共5页 New Technology of Library and Information Service
基金 北京理工大学基础研究基金"机构知识库构建研究"(项目编号:20061442003)的研究成果之一
关键词 机构知识库 自动存储 信息提取 NUTCH DSPACE Institutional repositories Automatic archive Information extraction Nutch DSpace
  • 相关文献

参考文献13

  • 1Lynch C A. Institutional Repositories: Essential Infrastructure for Scholarship in the Digital Age [ EB/OL]. [ 2010 - 08 - 31 ]. http ://scholarship. utm. edu/21/1/Lynch,_IRs, pdf.
  • 2OpenDOAR [ E B/OL ]. [ 2010 - 09 - 10 ]. http ://www. opendoar. org/.
  • 3CiteULike: Everyone' s Library [ EB/OL]. [ 2010 - 09 - 10 ].http ://www. citeulike, org/.
  • 4Symplectic Elements - Publications Management System[ EB/OL]. [ 2010 - 08 - 31 ]. http ://www. symplectic, co. uk/products/publications, html.
  • 5Ponomareval N, Gomez J M, Pekar V. AIR : A Semi - Automatic System for Archiving Institutional Repositories [ EB/OL ]. [ 2010 - 08 - 24 ]. http ://clg. wlv. ac. uk/papers/AIR - system, pdf.
  • 6SHERPA/RoMEO Home - Publisher Copyright Policies & Self- archiving[ EB/OL]. [ 2010 - 10 - 04 ]. http://www, sherpa, ac. uk/romeo/.
  • 7SWORD v2.0 : Deposit Lifeeyele [ EB/OL]. [ 2010 - 10 - 04 ]. http ://www. mopsl, com/oracle/event/pasig/downloads/SWORD- forDepositLifecycle presentation, pdf.
  • 8Hanlon A. Asking for Permission: A Survey of Copyright Work- flows for Institutional Repositories [ EB/OL]. [ 2010 - 11 - 01 ]. http ://works. bepress, com/marisa ramirez/14/.
  • 9Li H, Councill I G, Bolelli L, et al. CiteSeerx -A Scalable Autonomous Scientific Digital Library [ C ]. In : Proceedings of the 1 st International Conference on Scalable Information Systems (IN- FOSCALE06), Hong Kong, China. 2006.
  • 10刘兰,吴振新,向菁,孙志茹.网络信息资源保存开源软件综述[J].现代图书情报技术,2009(5):11-17. 被引量:14

二级参考文献34

  • 1李春旺.网络环境下学术信息的开放存取[J].中国图书馆学报,2005,31(1):33-37. 被引量:96
  • 2Heritrix[EB/OL].[2009-02-26].http://crawler.Archive.org/.
  • 3HTTrack[EB/OL].[2009-02-26].http://www.httrack.com/.
  • 4Web Curator Tool[RB/OL].[2009 -02 -26].http://webcurator.sourceforge.net/.
  • 5NetArchiveSuite[EB/OL].[2009-02-26].http://netArchive.dk/suite.
  • 6Nutch[EB/OL].[2009-02-26].http://Archive-access.sourceforge.net/projects/nutch/.
  • 7Release 1.12.0[EB/OL].[2009-02-26].http://crawler.Atchive.org/articles/releasenotes/1_12_0.html.
  • 8DeepArc[EB/OL].[2009 -02 -26].http://deeparc.sourceforge.net/.
  • 9GNU Wget[EB/OL].[2009 -02 -26].http://www.gnu.org/software/wget/.
  • 10Juha Hakala.Archiving the Web:European Experiences[EB/OL].[2009-02-26].http://www.lib.helsinki.fi/tietolinja/0203/webArchive.html.

共引文献25

同被引文献18

引证文献2

二级引证文献3

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部