期刊文献+

基于数据增强的中文医疗命名实体识别 被引量:9

Data Augmentation for Chinese Clinical Named Entity Recognition
原文传递
导出
摘要 由于缺乏大量已标注数据,在中文医疗命名实体识别中,主要利用外部资源来改善医疗实体识别的性能,这需要大量的时间和有效的规则加入外部资源.为了解决标注数据不足的问题,提出了一种基于生成对抗网络的数据增强算法,自动生成大量标注数据,提高医疗实体识别的性能.实验结果表明,该算法在性能方面优于实验中的基准模型,证明了该算法在医疗实体识别上的有效性. Chinese clinical named entity recognition plays an important role in recognizing medical entities contained in Chinese electronic medical records.Limited to lack of large annotated data,most of existing methods concentrate on employing external resources to improve the performance of clinical named entity recognition,which require lots of time and efficient rules.To solve the problem of lack of large annotated data,data augmentation using sequence adversarial generative network is used to generate more various data depending on entities and non-entities in the training set.Experiments show that when using generated data to expand training set,the proposed named entity recognition system has achieved competitive performance compared with state-of-art methods,which shows the effectiveness of our data augmentation method.
作者 王蓬辉 李明正 李思 WANG Peng-hui;LI Ming-zheng;LI Si(School of Artificial Intelligence,Beijing University of Posts and Telecommunications,Beijing 100876,China)
出处 《北京邮电大学学报》 EI CAS CSCD 北大核心 2020年第5期84-90,共7页 Journal of Beijing University of Posts and Telecommunications
基金 国家自然科学基金项目(61702047)
关键词 命名实体识别 数据增强 序列生成对抗网络 named entity recognition data augmentation generative adversarial network
  • 相关文献

同被引文献118

引证文献9

二级引证文献45

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部