摘要
中医药文本命名实体识别在中医药文本挖掘中占有重要地位,本文通过BiLSTM-CRF方法实现对中医医案文本进行命名实体识别,不仅实现了基本命名实体识别,通过对数据集按照中草药、疾病和症状三个类别进行标记,还能够进行命名实体类别识别。对中医药相关医案进行规整的10292条句子进行序列标注,基于word2vec的向量构建,从而进行模型训练迭代,得到了准确率为97.23%,召回率为89.47%,F值为88.34%的中医药命名实体识别模型。各类别识别中,中草药类别识别精准率为94.41%,召回率为94.36%,F值为94.38%;疾病类别精准率为80.92%,召回率为80.92%,F值为80.92%;症状类别精准率为75.68%,召回率为81.68%,F值为78.56%,人工测试模型效果较好,能够对医案数据进行实体识别。命名实体识别模型较多,但用于中医药相关命名实体识别模型数量微乎其微,构建中医药相关命名实体识别模型,将更加有效的推动中医药文本挖掘发展。
Text named entity recognition of Chinese medicine occupies an important position in text mining of traditional Chinese medicine,this article through the BiLSTM-CRF method was carried out on the basis of traditional Chinese medicine text named entity recognition,not only has realized the basic named entity recognition,based on the data set according to the Chinese herbal medicine,the three categories and symptoms,also can used to identify the named entity classes.Sequence annotation was performed on 10292 sentences of TCM related medical cases,and vector construction was conducted based on word2 vec to carry out model training iteration.Thus,a TCM named entity recognition model with accuracy rate of 97.23%,recall rate of 89.47%and F value of 88.34%was obtained.Among all kinds of recognition,the accuracy rate of Chinese herbal medicine category identification is 94.41%,recall rate is 94.36%and F value is94.38%.The precision rate of disease category was 80.92%,recall rate was 80.92%,and F value was 80.92%.The accuracy rate of the symptom category was 75.68%,the recall rate was 81.68%,and the F value was 78.56%.There are many named entity recognition models,but the number of them used for TCM related named entity recognition is very small.Therefore,the establishment of TCM related named entity recognition model will promote the development of TCM text mining more effectively.
作者
肖瑞
胡冯菊
裴卫
Xiao Rui;Hu Fengju;Pei Wei(Hubei University of Chinese Medicine,Wuhan,430065,China)
出处
《世界科学技术-中医药现代化》
CSCD
北大核心
2020年第7期2504-2510,共7页
Modernization of Traditional Chinese Medicine and Materia Medica-World Science and Technology
基金
2017年湖北中医药大学“青苗计划”项目[No.2017ZZX016]:基于中医电子病历的慢性乙型肝炎诊断预测算法研究,负责人:肖瑞
国家中医药管理局2018年度中医药法制化建设项目[No.GZY-FJS-2018-162]:互联网虚假违法中医医疗广告监测,负责人:肖瑞