摘要
针对中医药领域常用命名实体识别模型存在的边界模糊和歧义性等问题,本文提出基于大规模预处理中文语言模型(Bert)的中医方剂文本命名实体识别方法。通过Bert预训练模型接受其相对应的词向量,将预处理完成的词向量输入到长短期记忆(Bi-LSTM)模块中,完成对文本上下文语义信息的捕获,最后使用条件随机场(CRF)模块解码输出得到的预测标签排序,依次检索和排序各类中医方剂文本实体,从而完成整个实体识别步骤,结果显示出Bert对中医方剂各类实体识别具有较高的适用性,中医方剂各类实体识别的准确率得到显著提升。
Aiming at the boundary ambiguity and ambiguity of named entity recognition models commonly used in the field of traditional Chinese medicine,a named entity recognition method of TCM prescription text based on large-scale preprocessed Chinese language model(Bert)is proposed.The corresponding word vector is accepted by the Bert pre-training model,and the preprocessed word vector is input into the long-term and short-term memory(Bi-LSTM)module to capture the semantic information of the text context.Finally,the conditional random field(CRF)module is used to sort the predictive tags obtained from the output,and the text entities of all kinds of TCM prescriptions are retrieved and sorted in turn,so as to complete the whole entity recognition step.The results show that Bert has high applicability to all kinds of entity recognition of traditional Chinese medicine prescription,and the accuracy of entity recognition of traditional Chinese medicine prescription has been significantly improved.
作者
徐丽娜
李燕
钟昕妤
陈月月
帅亚琦
XU Li-na;LI Yan;ZHONG Xin-yu;CHEN Yue-yue;SHUAI Ya-qi(Information Engineering Institution of Gansu University of Chinese Medicine,Lanzhou 730000,Gansu,China)
出处
《医学信息》
2023年第4期32-37,共6页
Journal of Medical Information
关键词
深度学习
中医方剂
命名实体识别模型
Deep learning
Traditional Chinese medicine prescriptions
Named entity recognition models