期刊文献+

融合实体特性识别越南语复杂命名实体的混合方法 被引量:2

A hybrid method to recognize vietnamese complex named entity incorporating entity properties
下载PDF
导出
摘要 命名实体识别是自然语言处理过程中的基础任务。本文针对越南语的复杂命名实体难识别及F值不够高的问题,提出了一种结合实体库的越南语命名实体识别混合方法。首先,本文根据越南语的语言和实体特点,选取有效的局部特征和全局特征,应用最大熵模型进行越南语命名实体识别;其次,根据本文制定的命名实体的规则进行越南语命名实体识别;然后,结合两者的识别结果,以规则为主,统计为辅原则;最后经过人工校对,把获取到的正确标记的实体加入到实体库,动态扩增实体库,为规则制定和特征选取提供丰富的语料和依据。实验表明,该方法能够有效地结合规则与统计的方法优点,互相弥补不足,明显提高了识别的正确率、召回率和F值。 N E R ( n a m e d entity recognition) is the basic task in natural language processing. A i m e d at the problems of l o w Fvalues and the difficulty with c o mplex Vietnamese n a m e d entity recognition, a hybrid m e t h o d incorporating entity propertiesis proposed. Firstly, according to the Vietnamese language and entity characteristics, local and global features were selectedand a m a x i m u m entropy m o d e l built to recognize Vietnamese n a m e d entities. Secondly, according to the n a m e d entity rulesobtained, the Vietnamese entity w a s recognized. Then, combining the recognition results, this paper uses the rule as the m a i nprinciple and statistics as the supplementary principle. Finally, the obtained correct entity w a s added to the entity corpus aftermanual correction, dynamically expanding the entity corpus, w h i c h provided a rich corpus and a basis for determining rulesand selecting features. Experimental results s h o w that the m e t h o d can effectively take advantage of rules and statistics, andthat recognition accuracy, recall, and F are all significantly improved.
出处 《智能系统学报》 CSCD 北大核心 2016年第4期503-512,共10页 CAAI Transactions on Intelligent Systems
基金 国家自然科学基金项目(61262041 61472168 61562052) 云南省自然科学基金重点项目(2013FA030)
关键词 越南语 实体库构建 实体识别 最大熵 规则 实体特点 全局特征 局部特征 vietnamese entity library construction entity recognition m a x i m u m entropy rules set entity characters lobal features local features
  • 相关文献

参考文献10

二级参考文献79

共引文献102

同被引文献7

引证文献2

二级引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部