期刊文献+

融入词性的医疗命名实体识别研究

Identifying Medical Named Entities with Word Information
原文传递
导出
摘要 【目的】针对命名实体边界识别困难问题,融入词信息以改进在线问诊记录中临床关键特征的识别与推断。【方法】基于MacBERT与条件随机场构建模型,对词位置和词性等词信息进行位置“软”嵌入,利用说话者角色嵌入引入对话文本信息。同时,引入加权多分类交叉熵解决实体类别不均衡问题。【结果】在春雨医生互联网在线问诊记录上开展实证研究,本文模型在命名实体识别任务上的F_(1)值为74.35%,相比直接利用MacBERT模型提高近2个百分点。【局限】未设计专门对中文分词的模型。【结论】与直接利用MacBERT模型建模相比,融入词信息等更多维度特征能有效提升模型的识别能力。 [Objective]This paper utilizes the word information to identify and infer the key clinical features in online consultation records and address the difficulty in recognizing the boundaries of named entities.[Methods]First,we constructed a new model based on MacBERT and conditional random fields.Then,we embedded the word position and part of speech as the dialogue text information by the speaker role embedding.Finally,we used the weighted multi-class cross-entropy to solve the problem of entity category imbalance.[Results]We conducted an empirical study with online consultation records from Chunyu Doctor.The F_(1) value of the proposed model in the named entity recognition task was 74.35%,which was nearly 2% higher than directly using the MacBERT model.[Limitations]We did not design a specific model for Chinese word segmentation.[Conclusions]Our new model with more dimensional features can effectively improve its ability to recognize key features of clinical findings.
作者 本妍妍 庞雪芹 Ben Yanyan;Pang Xueqin(School of Mathematics and Statistics,Huazhong University of Science and Technology,Wuhan 430074,China;Archives of Wuhan University of Science and Technology,Wuhan 430081,China)
出处 《数据分析与知识发现》 CSCD 北大核心 2023年第5期123-132,共10页 Data Analysis and Knowledge Discovery
基金 国家自然科学基金项目(项目编号:11971185)的研究成果之一。
关键词 中文命名实体识别 在线医疗问诊 词信息融入 MacBERT 加权交叉熵 Chinese Named Entity Recognition Online Medical Consultation Word Information Embedding MacBERT Weighted Cross Entropy
  • 相关文献

参考文献13

二级参考文献74

  • 1黄丹.网络医疗对医疗服务理念的挑战[J].中药研究与信息,2005,7(9):31-32. 被引量:4
  • 2俞鸿魁,张华平,刘群,吕学强,施水才.基于层叠隐马尔可夫模型的中文命名实体识别[J].通信学报,2006,27(2):87-94. 被引量:153
  • 3周俊生,戴新宇,尹存燕,陈家骏.基于层叠条件随机场模型的中文机构名自动识别[J].电子学报,2006,34(5):804-809. 被引量:112
  • 4Grishman R, Sundheim B. Message Understanding Conference-6: a.brief history // COLING. Copen- hagen, 1996, 96:466-471.
  • 5Doddington G R, Mitchell A, Przybocki M A, et al. The automatic content extraction (ACE) program- tasks, data, and evaluation // LREC. Lisbon, 2004: 837-840.
  • 6DiSantostefano J. International classification of diseases 10th revision (ICD-10). The Journal for Nurse Practitioners, 2009, 5(1): 56-57.
  • 7Lindberg D A, Humphreys B L, McCray A T. The unified medical language system. Methods of Infor- mation in Medicine, 1993, 32(4): 281-291.
  • 8McDonald C J, Overhage J M, Tierney W M, et al. The regenstrief medical record system: a quarter century experience. International Journal of Medical Informatics, 1999, 54(3): 225-253.
  • 9Kraus S, Blake C, West S L. Information extraction from medical notes //Medinfo 2007. Brisbane, 2007: 1-2.
  • 10Sondhi P, Gupta M, Zhai C X, et al. Shallow information extraction from medical forum data // Proceedings of the 23rd International Conference on Computational Linguistics: Posters. Association for Computational Linguistics. Beijing, 2010:1158-1166.

共引文献183

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部