摘要
命名实体识别是自然语言处理领域的一项基础研究,它对于语言的深层处理有重要意义。该文以最大熵模型为基础来进行名实体识别,提出了基于《知网》的两种改进策略来增强模型的泛化性能。第一种策略是将《知网》中词的义原作为特征加入到最大熵模型中;第二种策略是利用《知网》来计算最大熵模型中词特征之间的概念相似度。在北京大学《人民日报》语料上的实验结果表明第一种策略可以有效地提高名实体识别的性能,第二种策略的改进效果不明显。
Named entity recognition is a foundational issue of natural language processing and of substantial significance to deep language processing. This work adopts the maximum entropy model for named entity recognition and proposes two improvement strategies based on HowNet to enhance the generalization of maximum entropy model. The first strategy is to add the HowNet's sememe of concepts into the maximum entropy model as features. The other is to take advantage of HowNet to calculate the similarity between word features in maximum entropy model. The experiments on China Daily corpus show that the first strategy could improve named entity recognition performance significantly, while the second improves the performance trivially.
出处
《中文信息学报》
CSCD
北大核心
2008年第5期97-101,共5页
Journal of Chinese Information Processing
基金
国家自然科学基金资助项目(60435020
60673019)
国家863计划资助项目(2006AA01Z197
2007AA01Z172)
黑龙江省自然科学基金资助项目(E200635)
关键词
计算机应用
中文信息处理
名实体识别
概念相似度
《知网》
最大熵模型
computer application
Chinese information processing
named entity recognition
concept similarity
HowNet
maximum entropy model