摘要
实体是构成事件信息的基本单元,在事件中扮演着重要的角色。在自然语言处理领域,实体识别是信息提取、句法分析、机器翻译、篇章理解等应用领域重要的基础性工具。汉语句法成分特有的套叠现象决定了实体表达的复杂性,增加了识别的难度。这使得已有的用于命名实体识别中的研究方法在长地点实体的识别中不能取得好的效果。为研究自动提取实体方法,文章从事件报道领域出发,以最长地点实体为对象,对325篇新闻语料进行地点实体标注和抽取,分析、研究了地点实体的出现特征,并根据分析结论提出实体提取可行方案。
Entities are basic units of event information, and playing an important role in event. In the field of natural language processing, entity recognition is the key technique in many Chinese information processing applications such as in formation extraction, syntactic analysis, machine translation, text comprehension and so on. Special nesting phenomena of Chinese constituents determine the complexity of the entity, and there are many kinds of expression in the location entity, and the methods of the named entity recognition can't get a good result in the location entity recognition. So, in order to auto extract location entity, this paper artificially annotated 325 news, and statistically analyse appear characteristics of this location. Based on the result of analyze, a viable extract method is developed.
出处
《计算机与数字工程》
2011年第7期72-74,165,共4页
Computer & Digital Engineering
基金
广东省自然科学基金项目(编号:9151027501000039)资助
关键词
实体
事件
最长地点实体
提取
entity
event
maximal location entity
extraction