期刊文献+

基于预训练模型的医药说明书实体抽取方法研究

Study on Entity Extraction Method for Pharmaceutical Instructions Based on Pretrained Models
下载PDF
导出
摘要 药品说明书医疗实体抽取可为用药信息智能检索及构建医疗知识图谱提供基础数据,具有重要研究意义与应用价值。针对治疗不同种类疾病的药品说明书中的医疗实体存在着较大的差异从而导致模型训练需要标注大量样本的问题,采用“大模型+小模型”的设计思路,提出了一种基于预训练模型的部分标签命名实体识别模型,先采用通过少量样本微调的预训练语言模型抽取药品说明书中的部分实体,再利用基于Transformer的部分标签模型进一步优化实体提取结果。部分标签模型采用平面格结构对输入文本、已识别出的部分实体及实体标签进行编码,使用Transformer提取特征表示,最后通过条件随机场(CRF)预测实体标签。为了减少训练模型的标注数据,利用标注样本实体掩盖策略,提出一种样本数据增广方法对部分标签模型进行训练。实验验证了“大模型+小模型”在医疗实体抽取的可行性,结果表明精确率(precision,P)、召回率(recall,R)和F1分数分别为85.0%、86.1%、85.6%,比其他学习方法更具优势。 The extraction of medical entities from drug instructions provides fundamental data for the intelligent retrieval of medication information and the construction of medical knowledge graphs,with remarkable research significance and practical value.However,the heterogeneity of medical entities in drug instructions for treating different diseases poses challenges in model training,which requires a large number of annotated samples.To address this issue,a“large model + small model”design approach is used in this research.Specifically,this research proposes a part-label named entity recognition model based on a pre-trained model,which first employs a pre-trained language model fine-tuned on a small number of samples to extract partial entities from drug instructions,and then utilizes a Transformerbased part-label model to further optimize the entity extraction results.The part-label model encodes the input text,identified partial entities,and entity labels using a planar lattice structure,extracts feature representations using Transformer,and predicts entity labels through a conditional random fields(CRF) layer.To reduce the need for annotated training data,a sample data augmentation method is proposed using entity masking strategy on labeled samples to train the part-label model.Experimental results validate the feasibility of the“large model + small model”approach in medical entity extraction,with precision(P),recall(R),and F1 score of 85.0%,86.1%,and 85.6%,respectively,demonstrating superior performance compared with other learning methods.
作者 陈仲永 黄雍圣 张旻 姜明 CHEN Zhongyong;HUANG Yongsheng;ZHANG Min;JIANG Ming(Zhejiang Pharmaceutical Information Publicity and Development Service Center,Hangzhou 310061,China;School of Computer Science,Hangzhou Dianzi University,Hangzhou 310018,China)
出处 《计算机科学与探索》 CSCD 北大核心 2024年第7期1911-1922,共12页 Journal of Frontiers of Computer Science and Technology
基金 浙江省尖兵领雁计划项目(2023C01218)。
关键词 命名实体识别 预训练模型 医疗实体抽取 TRANSFORMER named entity recognition(NER) pre-trained models medical entity extraction Transformer
  • 相关文献

参考文献6

二级参考文献72

  • 1付秀,陈麒麟,李杰,付毅,包国峰.基于智能预问诊的全景多学科会诊平台的设计与应用[J].中国数字医学,2021,16(10):79-82. 被引量:7
  • 2李妮,关焕梅,杨飘,董文永.基于BERT-IDCNN-CRF的中文命名实体识别方法[J].山东大学学报(理学版),2020,55(1):102-109. 被引量:54
  • 3蔡玉萍.药品说明书在临床用药中的指导作用[J].护理研究(下半月),2004,18(10):1850-1851. 被引量:18
  • 4吕卫红,江鑫,王松华.药品说明书数据库的建立与应用[J].中国药房,2007,18(4):316-318. 被引量:3
  • 5Zhou Xuezhong, Peng Yonghong, Liu Baoyan. Text Mining for Traditional Chinese Medical Knowledge Discovery: A Survey [J]. Journal of Biomedical Informatics,2010,43(4):650-660.
  • 6Zhou Xuezhong, Liu Baoyan, Wang Yinghui, et al. Building Clinical Data Warehouse for Traditional Chinese Medicine Knowledge Discovery [C]/ / Proc. of International Conference on BioMedical Engineering and Informatics. [S. l.]:IEEE Press,2008:615-620.
  • 7Zhou Xuezhong, Chen Shibo, Liu Baoyan, et al. Development of Traditional Chinese Medicine Clinical Data Warehouse for Medical Knowledge Discovery and Decision Support[J]. Artificial Intelligence in Medicine, 2010,48(2/ 3):139-152.
  • 8Lafferty J D,McCallum A,Pereira F C N. Conditional Random Fields:Probabilistic Models for Segmenting and Labeling Sequence Data [C]/ / Proc. of the 18th International Conference on Machine Learning. [S. l.]: Morgan Kaufmann Publishers Inc. ,2001:282-289.
  • 9熊 英. 中文自然语言理解中基于条件随机场理论的词法分析研究[D]. 上海:上海交通大学,2009.
  • 10Franzén K,Eriksson G,Olsson F,et al. Protein Names and How to Find Them [J]. International Journal of medical Informatics,2002,67(1):49-61.

共引文献164

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部