期刊文献+

面向新类型人名识别的数据增强方法 被引量:5

Data Augmentation Method for New Type Person Named Entity Recognition
下载PDF
导出
摘要 人名识别常被作为命名实体识别任务的一部分,与其他类型的实体同时进行识别。当前使用NER方法的人名识别依赖于训练语料对特定类型人名的覆盖,在遇到新类型人名时性能显著下降。针对上述问题,该文提出了一种基于数据增强(data augmentation)的方法,使用新类型人名实体替换的策略来生成伪训练数据,该方法能够有效提升系统对新类型人名的识别性能。为了选择有代表性的特定类型人名实体,该文提出了贪心的代表性子类型人名选择算法。在使用1998年《人民日报》数据自动生成的伪测试数据和人工标注的新闻数据的测试结果中,多个模型上人名识别的F1值分别提升了至少12个百分点和6个百分点。 Person name recognition tasks are often performed as part of the named entity recognition(NER)tasks,along with other types of entities.Currently,person name recognition method relies on the coverage of the training corpus for a particular type of person name,and the performance is significantly degraded when a new type of person name is encountered.To address this issue,we propose a method namesd Data Augmentation.In this method,we generate pseudo training data by replacing the common person name entities in training data with new specific types of entities.This method can effectively improve the recognition performance of the system for new types of person names.We propose a greedy representative subtype name selection algorithm which can select typical person name of a specific type.We conduct experiments on two test data sets:one is pseudo test data set based on the People’s Daily data in 1998 and the other is manually labeled news data.The F1 measure of the recognition result is increased by at least 12% and 6%,respectively.
作者 宋希良 韩先培 孙乐 SONG Xiliang;HAN Xianpei;SUN Le(Chinese Information Processing Laboratory, Institute of Software,Chinese Academy of Sciences,Beijing 100190, China;University of Chinese Academy of Sciences, Beijing 100049, China)
出处 《中文信息学报》 CSCD 北大核心 2019年第6期72-79,共8页 Journal of Chinese Information Processing
基金 国家自然科学基金(61433015,61572477,61772505) 中国科协青年人才托举工程(YESS20160177)
关键词 人名识别 DATA Augmentation 新类型人名 person name recognition data augmentation new type of person name
  • 相关文献

参考文献3

二级参考文献30

共引文献109

同被引文献40

引证文献5

二级引证文献7

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部