期刊文献+

基于实体知识的石油炼化领域命名实体识别

Named Entity Recognition in Petroleum Refining Domain Based on Entity Knowledge
下载PDF
导出
摘要 石油炼化领域中的命名实体识别任务存在着标注数据稀缺,以及现有的预训练语言模型不能很好识别领域组合和嵌套实体的问题。基于此,首先提出一种基于外部实体知识的数据增强方法(EEKR),通过引入外部实体知识库,将其与标注数据中的实体进行实体级别替换完成数据增强,有效解决标注数据稀缺的问题。之后提出了基于内部实体知识的命名实体识别模型(IIEKNER),通过获取标注样本中的内部实体嵌入,将内部实体知识融入预训练模型,从而可以更准确地识别出石油炼化领域中的嵌套和组合实体。实验结果显示,相比于其他模型,基于EEKR数据增强方法的IIEKNER模型的识别效果更优。 Named entity recognition task in the petroleum refining domain suffers from the problems of scarcity of labeled data as well as the existing pre-trained language models cannot recognize domain combination and nested entities well.Based on this,a data augmentation method EEKR(External Entity Knowledge Replacement,EEKR)based on external entity knowledge is firstly proposed,which effectively solves the problem of scarcity of labeled data by introducing an external entity knowledge base and completing data augmentation by replacing it with entities in the labeled data at the entity level.After that,a named entity recognition model IIEKNER(Namd Entity Recognition Incorporating Internal Entity Knowledge,IIEKNER)is proposed,which incorporates internal entity knowledge into the pre-training model by obtaining internal entity embeddings in the labeled samples.Thus,nested and combined entities in the petroleum refining domain can be recognized more accurately.The experimental results show that compared to other models,the IIEKNER model based on EEKR data augmentation method has better recognition performance.
作者 丁建新 王晓伟 温欣 屈克将 王建华 赵艳红 胡思颍 DING Jianxin;WANG Xiaowei;WEN Xin;QU Kejiang;WANG Jianhua;ZHAO Yanhong;HU Siying(Smart Oil Services Business Unit,Kunlun Digital Technology Co.,Ltd.,Beijing 100071,China;College of Information Science and Engineering/College of Artificial Intelligence,China University of Petroleum(Beijing),Beijing 102249,China)
出处 《现代信息科技》 2024年第12期40-46,共7页 Modern Information Technology
基金 国家重点研发计划(2019YFC0312003)。
关键词 命名实体识别 石油炼化领域 数据增强 BERT named entity recognition petroleum refining domain data augmentation BERT
  • 相关文献

参考文献4

二级参考文献155

  • 1车万翔,刘挺,李生.实体关系自动抽取[J].中文信息学报,2005,19(2):1-6. 被引量:116
  • 2林东,邵军力.医学诊疗领域通用专家系统设计与实现[J].自动化学报,1995,21(3):380-382. 被引量:6
  • 3俞鸿魁,张华平,刘群,吕学强,施水才.基于层叠隐马尔可夫模型的中文命名实体识别[J].通信学报,2006,27(2):87-94. 被引量:157
  • 4中华人民共和国卫生部.电子病历基本规范(试行)[Online],available:http://www.gov.cn/zwgk/2010-03/04/content_1547432.htm,December27,2013.
  • 5Wasserman R C. Electronic medical records (EMRs), epi- demiology, and epistemology: reflections on EMRs and fu- ture pediatric clinical research. Academic Pediatrics, 2011, 11(4): 280-287.
  • 6Uzuner O, Mailoa J, Ryan R, Sibanda T. Semantic relations for problem-oriented medical records. Artificial Intelligence in Medicine, 2010, 50(2): 63-73.
  • 7Demner-Fushman D, Chapman W W, McDonald C J. What can natural language processing do for clinical decision sup- port? Journal of Bioxnedical Informatics, 2009, 42(5): 760- 772.
  • 8Eysenbach G. Recent advances: consumer health informat- ics. British Medical Journal, 2000, 320(7251): 1713-1716.
  • 9Sager N, Friedman C, Lyman M S. Review of Medical lan- guage processing: computer management of narrative data. Computational Linguistics, 1989, 15(3): 195-198.
  • 10National Institutes of Health. Research Repositories, Databases, and the HIPAA Privacy Rule [Online], available: http: / / privacyruleandresearch.nih.gov / pdf/r esearch-r eposit- ories_final.pdf, December 27, 2013.

共引文献129

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部