摘要
通过对数值信息抽取文献的调研,先从文献类型、学科领域、高频关键词三个方面进行定量分析,从抽取数据源、抽取对象、抽取方法与技术、结果评价和应用等方面对当前数值信息抽取研究进行了梳理和总结。研究发现当前对于数值信息抽取的研究具有五个特点:抽取数据源以新闻语料、Web网页为主,抽取对象以基数类数值信息和数量类数值信息为主,抽取方法以基于规则的方法为主,抽取结果评价指标比较单一,但应用领域较为广泛。图4。表3。参考文献56。
This paper first makes a quantitative analysis on the documents of numerical information extraction from three as- pects: document type, subject area and high frequency keywords. Then the research context is summarized from four aspects: data source type, object for extraction, extraction method and technique, result evaluation and corresponding application. Our findings are as follows: news corpus and web pages are the main data sources; cardinal numbers and quantitative phrases are the main objects for extraction; extraction method and technique are mainly rule-based and the result evaluation indicators are rela- tively simple but have a wide scope for application. 4 figs. 3 tabs. 56 refs.
出处
《中国图书馆学报》
CSSCI
北大核心
2014年第2期107-119,共13页
Journal of Library Science in China
关键词
数值信息
数值知识元
数值信息抽取
命名实体识别
Numerical information. Numeric knowledge element. Numerical information extraction. Named entityrecognition.