摘要
地理信息的语义解析有效地解决自然语言与地理信息系统之间的语义障碍问题。在分析中文文本和地理信息系统中地理实体描述和表达机制差异的基础上,结合地理命名实体描述的语言特点,制定中文文本的地理命名实体标注体系和标注规范,并以GATE(General Architecture for Text Engineering)作为标注平台,构建基于《中国大百科全书中国地理》的大规模标注语料库,以解决当前相关标准和规模化标准数据匮乏的问题。
Semantic interpretation of geographic information in natural language can solve the semantic problem between natural language and geographical information system.Annotation schemes and corpus annotation aim to analyze specific linguistic structure of geographical information found in the text,and the establishment of the metadata describing them.Firstly,the difference of representation of geographical entities in Chinese text and GIS is analyzed.Secondly,based on linguistic characteristics of geographical named entities in Chinese text,an annotation scheme is presented and the annotation specification is given in detail.Finally,GATE(General Architecture for Text Engineering)is introduced as the annotation platform,and a large-scale annotated corpus-GeoCorpus based on "Encyclopedia of China Geography" is developed and evaluated.This study effectively addresses the current lack of related specification and standardized data.
出处
《测绘学报》
EI
CSCD
北大核心
2012年第1期115-120,共6页
Acta Geodaetica et Cartographica Sinica
基金
国家自然科学基金(40971231)
江苏高校优势学科建设工程
关键词
中文文本
地理命名实体
标注体系
标注语料库
自然语言
chinese text
geographical named entities
annotation scheme
annotated corpus
natural language