摘要
命名实体识别是众多自然语言处理任务的核心内容之一,也是近年来的领域研究热点。本文将命名实体分为两大类:常规命名实体和领域命名实体。基于已经构建的领域本体MPO,本文提出一种基于本体知识规则与统计方法相结合的领域命名实体识别方法。该方法通过本体化实例,获取实体构成词性规则模板,结合CRFs机器学习模型,进行领域命名实体识别。实验结果表明:相比运用单一统计方法而言,该方法能使领域实体的识别性能显著提高,F值达到92.36%。同时表明本体化知识规则的有效运用,能够在领域实体边界和特殊形式领域实体识别的准确率上发挥积极作用。
Named Entity Recognition (NER) is one of kernel task in many Natural Language Processing (NLP) applications, which has recently become the hot spot of research. Named Entities are classified into General Named Entities (GNEs) and Domain Named Entities (DNEs) in this paper. We put forward a method of Chinese Domain Named Entity Recognition (DNER) which combining Conditional Random Field (CRF) with the rule templates of POS based on formalized instances that acquired from domain ontology constructed already. Results of experiments indicate that such a method can improve effectively the performance on DNER and F-measure has reached 92.36% . Experimental data also show that ontological knowledge can make great effect in recognizing the boundaries of DNEs and DNEs with special forms.
出处
《情报学报》
CSSCI
北大核心
2009年第6期857-863,共7页
Journal of the China Society for Scientific and Technical Information
基金
基金项目:本文得到国家863(2006AA012152,2006AA010109),国家自然科学基金(60672149)资助.