摘要
基于文本数据源的地理空间信息解析研究侧重于地名实体、空间关系等空间语义角色的标注和抽取,忽略了丰富的时间信息、主题事件信息及其时空一体化信息。该文通过分析中文文本中事件信息描述的语言特点和事件的时空语义特征,基于地名实体和空间关系标注研究成果,制定了中文文本的事件时空信息标注体系和标注模式,并以GATE(General Architecture for Text Engineering)为标注平台,以网页文本为数据源,构建了事件时空信息标注语料库。研究成果为中文文本中地理信息的语义解析提供标准化的训练和测试数据。
Text has become an important data source of geo-spatial information. Currently, researches on structured geo-spatial information expression focused on extraction of spatial information,such as place names and spatial rela- tions in text. However, abundant temporal information, event information and spatial-temporal information are ig- nored. In this paper, annotation of spatial-temporal information of event in Chinese text is proposed. Firstly, the lin guistic characteristics of spatial-temporal information of event in Chinese text are analyzed. Then, an annotation schema is presented,and the annotation specification is decribed in detail. Finally, GATE (General Architecture for Text Engineering) is introduced as the annotation platform,and a large-scale annotated corpus based on the Web da ta source is developed and evaluated. This study effectively addresses the current lack of related specification and standard data for interpretation of event and spatial-temporal information in Chinese text.
出处
《中文信息学报》
CSCD
北大核心
2016年第3期213-222,共10页
Journal of Chinese Information Processing
基金
国家自然科学基金(41401451
40971231)
国家863项目(2012AA12A403-3)
中央高校基本科研业务项目(JZ2014HGBZ0064)
江苏省测绘地理信息科研项目(JSCHKY201502)
关键词
中文文本
时空信息
事件
标注体系
标注语料库
Chinese text
spatial-temporal information
event
annotation schema
annotated corpus