摘要
在总结现有的引文元数据抽取方法的基础上,针对引文的排版惯例——引文在文档内部风格一致,提出了一种新的引文元数据抽取方法。重点描述了以往研究中很少涉及的引文元数据的自动发现和分割,探讨了风格一致性在引文元数据标注中的应用。实验结果表明此方法在引文元数据发现、分割和标注方面均取得了较好的效果。
After reviewing the existing methods on citation data extraction, the authors propose a new approach for the task depending on a common typesetting practice of bibliographies: style consistency of citation data in the same document. Citation data detection and segmentation task are described on which less attention is put in previous researches. Furthermore, the authors take advantage of the style consistency of bibliographies to enhance citation metadata tagging. Experimental results show that the proposed method performs well in citation data detection, segmentation and tagging.
出处
《北京大学学报(自然科学版)》
EI
CAS
CSCD
北大核心
2010年第6期893-900,共8页
Acta Scientiarum Naturalium Universitatis Pekinensis
基金
国家科技支撑计划(2006BAH02A21)资助
关键词
引文元数据
风格一致性
元数据抽取
数字图书馆
bibliographic metadata
style consistency
metadata extraction
digital library