摘要
【目的】融合中文病历的结构功能信息,丰富病历文本的语义内涵,提升文本表示的准确性和后续文本挖掘效果。【方法】依据中文病历结构功能特征,创新文本语义表示策略,使用BiLSTM-CRF模型实现基于结构的命名实体智能识别,在词向量层面引入实体及结构信息,经由TextCNN模型进一步提取局部上下文特征,得到文本语义内涵更为丰富的向量表示形式。【结果】在命名实体识别实验中,基于结构的医疗实体识别精确率、召回率和F值分别达93.20%、95.19%和94.19%;在文本表示的分类验证实验中,所提病历文本表示方法的分类准确率达到92.12%。【局限】需进一步加强对更多类型文本的验证,细化结构识别过程,使所提方法更好地应用于文本挖掘工作。【结论】本文将病历结构功能信息引入病历文本表示工作,实验证明了其既能有效提高命名实体识别准确度,又能进一步丰富文本语义内涵和提升文本表示效果。
[Objective]This paper tries to improve the accuracy of text representation and mining,with the help of structural and functional information from Chinese medical records.[Methods]First,we proposed a new semantic representation strategy for the texts of Chinese medical records based on their structure-function features.Then,we used the BiLSTM-CRF model to recognize named entities,which introduced structure information at the word vector level.Finally,we utilized the TextCNN model to extract local context features,which helped us obtain a vector representation with richer text semantic connotations.[Results]The precision,recall and F values of the new model reached 93.20%,95.19%and 94.19%respectively,while the classification accuracy rate reached 92.12%.[Limitations]Future research is needed to evaluate our model with more texts and refine the structure recognition process.[Conclusions]The proposed method could effectively improve the accuracy of named entity recognition,and enrich the semantic connotation and representation of the texts.
作者
胡吉明
钱玮
文鹏
吕晓光
Hu Jiming;Qian Wei;Wen Peng;Lv Xiaoguang(School of Information Management,Wuhan University,Wuhan 430072,China;Information Retrieval and Knowledge Mining Laboratory,Wuhan University,Wuhan 430072,China;School of Marxism,Wuhan University,Wuhan 430072,China;Renmin Hospital of Wuhan University,Wuhan 430060,China)
出处
《数据分析与知识发现》
CSSCI
CSCD
北大核心
2022年第8期110-121,共12页
Data Analysis and Knowledge Discovery
基金
国家自然科学基金面上项目(项目编号:71874125)
湖北省青年拔尖人才培养计划项目的研究成果之一。