期刊文献+

基于特征分析的数字化期刊元数据自动抽取算法 被引量:1

Automatic Metadata Extraction of Scanned Journal based on Feature Analysis
下载PDF
导出
摘要 在对纸本期刊进行数字化过程中,元数据抽取是必不可少的步骤。传统的手工抽取需要大量的人力物力,效率很低。针对扫描期刊,提出了一种基于扫描页面特征分析的元数据自动抽取算法,分析扫描页的格式、结构、字体等特征,采用基于规则和有监督的机器学习方法进行抽取,实验表明该算法能够取得较高的准确率和召回率,同时显著地提高了元数据标引的效率。
出处 《情报杂志》 CSSCI 北大核心 2010年第3期143-146,共4页 Journal of Intelligence
  • 相关文献

参考文献7

  • 1徐维,胡吉兵,管志宇.元数据概念的产生、发展与成熟[J].中国档案,2003(8):43-44. 被引量:13
  • 2Hu Y, Li H, Cao Y, et al. Automatic Extraction of Titles From General Documents Using Machine Learning [ J ]. Information Processing and Management, 2006,42 : 1276 - 1293.
  • 3李朝光,张铭,邓志鸿,杨冬青,唐世渭.论文元数据信息的自动抽取[J].计算机工程与应用,2002,38(21):189-191. 被引量:38
  • 4杨宇,张铭,周宝曜.基于多种规则的课程元数据自动抽取[J].计算机科学,2008,35(3):94-96. 被引量:7
  • 5Lu X N, Kahle B, Wang J Z, et al. A Metadata Generation System for Scanned Scientific Volumes[C]. Proceedings of the 8th ACM/ IEEE - CS Joint Conference on Digital Libraries, New York, NY, USA:ACM,2008(6) : 167 - 176.
  • 6Lu X N, Kahle B. Automatic Metadata Generation for Scanned Scientific Volumes[ C]. Proceeding of the 2008 ACM Workshop on Research Advances in large Digital hook Repositories, New York, NY, USA: ACM, 2008 (10) : 57 - 58.
  • 7Han H, Giles C L, Manavoglu E, et al. Automatic Document Metadata Extraction Using Support Vector Machines [C]. In JCDL' 03 : Proceedings of the 3rd ACM/IEEE - CS Joint Conference on Digital Libraries, Washington, DC, USA: IEEE Computer Society, 2003 : 37 - 48.

二级参考文献13

  • 1Public Record Office. Management, Appraisal and Preservation of Electronic Records
  • 2Sue McKemmish, Glenda Acland, etc. Describing Records in Context in the Continuum: Th eAustralian recordkeeping Metadata Schema. Archivaria.2000. 48
  • 3David Wallance. Metadata and Archival Management of Electronic Records. Archivaria. 1993. 36
  • 4ICA. Guide for Managing Electronic Records from an Archival Perspective. 1997. p20.
  • 5National Achives of Australia. Recordkeeping Metadata Standard for Commonwealth Agencies. 1999. pT.
  • 6刘世杰,唐世渭,杨冬青,王腾蛟,李立宇.基于XML技术的Web信息提取和集成.见:第二十届全国数据库学术会议,2003
  • 7Crescenzi V, Mecca G. Grammars have Exceptions. Information Systems 1998,23 (8): 539-565
  • 8Garcia-Molina H, Papakonstantinou Y, Quass D, et al. The TSIMMIS Approach to Mediation: Data Models and Languages (extended abstract), In NGITS, 1995
  • 9Arocena G, Mendelzon A. WebOQL: Restructuring Documents, Databases, and Webs. In: Proe. ICDE '98, Feb. 1998
  • 10Huck G, Fankhauser P, Aberer K, Neuhold E J. Jedi: Exchanging and Synthesizing Information from the Web. Coopis, 1998

共引文献53

同被引文献17

引证文献1

二级引证文献7

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部