摘要
目的肺癌医案中蕴含丰富的四诊信息,这些四诊信息对肺癌的研究具有重要意义。本文通过基于字向量的BiGRU-CRF方法实现四诊信息实体抽取研究。方法研究利用BERT模型对基于自定义词典自动化标注后的肺癌临床数据进行预训练,得到包含上下文语义的字向量,再将其作为BiGRU-CRF模型输入,实现肺癌医案四诊信息命名实体抽取。结果本文方法对临床表现、舌象、脉象、身体部位、程度副词五类实体抽取的F1值分别为98.17%、99.74%、99.77%、94.72%、93.36%,对比模型BERT-BiLSTM-CRF、BERT模型和Word2vec-BiGRU-CRF模型抽取的F1值分别为(96.46%、99.31%、98.78%、94.95%、92.44%)、(94.38%、95.14%、94.99%、90.89%、91.82%)和(91.27%、97.95%、98.09%、87.01%、86.77%)。结论本文利用基于字向量的BiGRU-CRF方法具有更强的命名实体识别能力,可以更好地应用于中医医案命名实体抽取研究,进而为医案的关系抽取以及知识图谱构建提供支持。
Objective To achieve the study of entity extraction of information of four diagnostic methods through the word vector-based BiGRU-CRF method because medical cases of lung cancer are rich in four diagnostic information,which are of great importance to the study of lung cancer.Methods In the research,the BERT model was used to pretrain the lung cancer clinical data after automated annotation based on custom dictionaries to obtain word vectors containing contextual semantics,which were then used as input to the BiGRU-CRF model to achieve named entity extraction of lung cancer medical case Information with four diagnostic methods.Results A F1 value of 98.17%,99.74%,99.77%,94.72%,93.36%were selected for clinical manifestations,tongue,pulse,body parts and degree adverbs.The F1 values extracted from BERT-BiLSTM-CRF model,BERT model and Word2 vec-BiGRU-CRF model were(96.46%,99.31%,98.78%,94.95%,92.44%),(94.38%,95.14%,94.99%,90.89%,91.82%)and(91.27%,97.95%,98.09%,87.01%,86.77%).Conclusion BiGRU-CRF method based on word vector in this paper has stronger recognition ability of named entity and can be better applied to the study of entity extraction of TCM medical records.And then provide more support for the relationship extraction of medical records and the construction of knowledge map.
作者
屈丹丹
杨涛
朱垚
胡孔法
Qu Dandan;Yang Tao;Zhu Yao;Hu Kongfa(School of Artificial Intelligence and Information Technology,Nanjing University of Chinese Medicine,Nanjing 210023,China;The First Clinical Medical College,Nanjing University of Chinese Medicine,Nanjing 210023,China)
出处
《世界科学技术-中医药现代化》
CSCD
北大核心
2021年第9期3118-3125,共8页
Modernization of Traditional Chinese Medicine and Materia Medica-World Science and Technology
基金
国家科学技术部国家重点研发计划“中医药现代化研究”重点专项(2017YFC1703500):中医药大数据中心与健康云平台构建,负责人:李国正
国家自然科学基金委员会面上项目(82074580):基于知识图谱的现代名老中医诊治肺癌用药规律及其机制研究,负责人:胡孔法