摘要
在会议文献开放资源采集与服务系统中实现了自动抽取+人工校对的元数据抽取流程,并设计了一个自动抽取器。针对会议文献开放资源本身的特点,该自动抽取器集成了多个基础抽取模板,并易于构建针对某个会议文献集的处理模板,能实现对多种格式的文献进行自动抽取,具有较高的准确度。
The paper realizes a metadata extraction workflow of automatic extraction plus manual proofreading in the acquisition and service system of open conference literatures, and designs an automatic extractor. In view of the features of the open conference literatures themselves, the automatic extractor integrates several basic extraction templates, and is easy to construct the processing template for certain conference literature collections. It can automatically extract literatures with different formats and have relatively high accuracy.
出处
《情报理论与实践》
CSSCI
北大核心
2012年第9期117-119,共3页
Information Studies:Theory & Application
基金
中国科学院科学数字图书馆二期先期启动项目"重要会议开放资源采集与服务系统"和"十二五"重点建设任务"开放资源服务系统建设"的研究成果
关键词
开放获取
会议文献
元数据
信息抽取
open access
conference literature
metadata
information extraction