With the rapid development of information technology,the electronifi-cation of medical records has gradually become a trend.In China,the population base is huge and the supporting medical institutions are numerous,so ...With the rapid development of information technology,the electronifi-cation of medical records has gradually become a trend.In China,the population base is huge and the supporting medical institutions are numerous,so this reality drives the conversion of paper medical records to electronic medical records.Electronic medical records are the basis for establishing a smart hospital and an important guarantee for achieving medical intelligence,and the massive amount of electronic medical record data is also an important data set for conducting research in the medical field.However,electronic medical records contain a large amount of private patient information,which must be desensitized before they are used as open resources.Therefore,to solve the above problems,data masking for Chinese electronic medical records with named entity recognition is proposed in this paper.Firstly,the text is vectorized to satisfy the required format of the model input.Secondly,since the input sentences may have a long or short length and the relationship between sentences in context is not negligible.To this end,a neural network model for named entity recognition based on bidirectional long short-term memory(BiLSTM)with conditional random fields(CRF)is constructed.Finally,the data masking operation is performed based on the named entity recog-nition results,mainly using regular expression filtering encryption and principal component analysis(PCA)word vector compression and replacement.In addi-tion,comparison experiments with the hidden markov model(HMM)model,LSTM-CRF model,and BiLSTM model are conducted in this paper.The experi-mental results show that the method used in this paper achieves 92.72%Accuracy,92.30%Recall,and 92.51%F1_score,which has higher accuracy compared with other models.展开更多
为解决多模态命名实体识别(Multimodal named entity recognition,MNER)方法研究中存在的图像特征语义缺失和多模态表示语义约束较弱等问题,提出多尺度视觉语义增强的多模态命名实体识别方法(Multi-scale visual semantic enhancement f...为解决多模态命名实体识别(Multimodal named entity recognition,MNER)方法研究中存在的图像特征语义缺失和多模态表示语义约束较弱等问题,提出多尺度视觉语义增强的多模态命名实体识别方法(Multi-scale visual semantic enhancement for multimodal named entity recognition method,MSVSE).该方法提取多种视觉特征用于补全图像语义,挖掘文本特征与多种视觉特征间的语义交互关系,生成多尺度视觉语义特征并进行融合,得到多尺度视觉语义增强的多模态文本表示;使用视觉实体分类器对多尺度视觉语义特征解码,实现视觉特征的语义一致性约束;调用多任务标签解码器挖掘多模态文本表示和文本特征的细粒度语义,通过联合解码解决语义偏差问题,从而进一步提高命名实体识别准确度.为验证该方法的有效性,在Twitter-2015和Twitter-2017数据集上进行实验,并与其他10种方法进行对比,该方法的平均F1值得到提升.展开更多
中文电子病历命名实体识别主要是研究电子病历病程记录文书数据集,文章提出对医疗手术麻醉文书数据集进行命名实体识别的研究。利用轻量级来自Transformer的双向编码器表示(A Lite Bidirectional Encoder Representation from Transform...中文电子病历命名实体识别主要是研究电子病历病程记录文书数据集,文章提出对医疗手术麻醉文书数据集进行命名实体识别的研究。利用轻量级来自Transformer的双向编码器表示(A Lite Bidirectional Encoder Representation from Transformers,ALBERT)预训练模型微调数据集和Tranfomers中的trainer训练器训练模型的方法,实现在医疗手术麻醉文书上识别手术麻醉事件命名实体与获取复杂麻醉医疗质量控制指标值。文章为医疗手术麻醉文书命名实体识别提供了可借鉴的思路,并且为计算复杂麻醉医疗质量控制指标值提供了一种新的解决方案。展开更多
The China Conference on Knowledge Graph and Semantic Computing(CCKS)2020 Evaluation Task 3 presented clinical named entity recognition and event extraction for the Chinese electronic medical records.Two annotated data...The China Conference on Knowledge Graph and Semantic Computing(CCKS)2020 Evaluation Task 3 presented clinical named entity recognition and event extraction for the Chinese electronic medical records.Two annotated data sets and some other additional resources for these two subtasks were provided for participators.This evaluation competition attracted 354 teams and 46 of them successfully submitted the valid results.The pre-trained language models are widely applied in this evaluation task.Data argumentation and external resources are also helpful.展开更多
Medical named entity recognition(NER)is an area in which medical named entities are recognized from medical texts,such as diseases,drugs,surgery reports,anatomical parts,and examination documents.Conventional medical ...Medical named entity recognition(NER)is an area in which medical named entities are recognized from medical texts,such as diseases,drugs,surgery reports,anatomical parts,and examination documents.Conventional medical NER methods do not make full use of un-labelled medical texts embedded in medical documents.To address this issue,we proposed a medical NER approach based on pre-trained language models and a domain dictionary.First,we constructed a medical entity dictionary by extracting medical entities from labelled medical texts and collecting medical entities from other resources,such as the YiduN4 K data set.Second,we employed this dictionary to train domain-specific pre-trained language models using un-labelled medical texts.Third,we employed a pseudo labelling mechanism in un-labelled medical texts to automatically annotate texts and create pseudo labels.Fourth,the BiLSTM-CRF sequence tagging model was used to fine-tune the pre-trained language models.Our experiments on the un-labelled medical texts,which were extracted from Chinese electronic medical records,show that the proposed NER approach enables the strict and relaxed F1 scores to be 88.7%and 95.3%,respectively.展开更多
基金This research was supported by the National Natural Science Foundation of China under Grant(No.42050102)the Postgraduate Education Reform Project of Jiangsu Province under Grant(No.SJCX22_0343)Also,this research was supported by Dou Wanchun Expert Workstation of Yunnan Province(No.202205AF150013).
文摘With the rapid development of information technology,the electronifi-cation of medical records has gradually become a trend.In China,the population base is huge and the supporting medical institutions are numerous,so this reality drives the conversion of paper medical records to electronic medical records.Electronic medical records are the basis for establishing a smart hospital and an important guarantee for achieving medical intelligence,and the massive amount of electronic medical record data is also an important data set for conducting research in the medical field.However,electronic medical records contain a large amount of private patient information,which must be desensitized before they are used as open resources.Therefore,to solve the above problems,data masking for Chinese electronic medical records with named entity recognition is proposed in this paper.Firstly,the text is vectorized to satisfy the required format of the model input.Secondly,since the input sentences may have a long or short length and the relationship between sentences in context is not negligible.To this end,a neural network model for named entity recognition based on bidirectional long short-term memory(BiLSTM)with conditional random fields(CRF)is constructed.Finally,the data masking operation is performed based on the named entity recog-nition results,mainly using regular expression filtering encryption and principal component analysis(PCA)word vector compression and replacement.In addi-tion,comparison experiments with the hidden markov model(HMM)model,LSTM-CRF model,and BiLSTM model are conducted in this paper.The experi-mental results show that the method used in this paper achieves 92.72%Accuracy,92.30%Recall,and 92.51%F1_score,which has higher accuracy compared with other models.
文摘为解决多模态命名实体识别(Multimodal named entity recognition,MNER)方法研究中存在的图像特征语义缺失和多模态表示语义约束较弱等问题,提出多尺度视觉语义增强的多模态命名实体识别方法(Multi-scale visual semantic enhancement for multimodal named entity recognition method,MSVSE).该方法提取多种视觉特征用于补全图像语义,挖掘文本特征与多种视觉特征间的语义交互关系,生成多尺度视觉语义特征并进行融合,得到多尺度视觉语义增强的多模态文本表示;使用视觉实体分类器对多尺度视觉语义特征解码,实现视觉特征的语义一致性约束;调用多任务标签解码器挖掘多模态文本表示和文本特征的细粒度语义,通过联合解码解决语义偏差问题,从而进一步提高命名实体识别准确度.为验证该方法的有效性,在Twitter-2015和Twitter-2017数据集上进行实验,并与其他10种方法进行对比,该方法的平均F1值得到提升.
文摘中文电子病历命名实体识别主要是研究电子病历病程记录文书数据集,文章提出对医疗手术麻醉文书数据集进行命名实体识别的研究。利用轻量级来自Transformer的双向编码器表示(A Lite Bidirectional Encoder Representation from Transformers,ALBERT)预训练模型微调数据集和Tranfomers中的trainer训练器训练模型的方法,实现在医疗手术麻醉文书上识别手术麻醉事件命名实体与获取复杂麻醉医疗质量控制指标值。文章为医疗手术麻醉文书命名实体识别提供了可借鉴的思路,并且为计算复杂麻醉医疗质量控制指标值提供了一种新的解决方案。
文摘中文电子病历实体包含大量的医学领域词汇并具有明显的嵌套特征。嵌套实体识别时往往存在目标实体定位不完整、不准确的问题。针对这一问题,提出了一种基于机器阅读理解的中文电子病历嵌套命名实体识别模型MRC-PBM(machine reading comprehension-position information biaffine and MLP)。该模型将命名实体识别(named entity recognition,NER)转化为机器阅读理解任务,将中文电子病历文本和预定义的查询语句串联作为输入,使用基于医学的预训练模型MC_BERT获取词向量,然后通过双向长短期记忆网络模型(BiLSTM)和多粒度扩张卷积模型分别获取双向的特征信息以及单词之间的信息,得到相应的特征向量,最后使用Hybrid-PBM预测器进行实体预测。在嵌套和平面NER数据集上进行实验。实验表明,该模型在糖尿病语料和公开医学数据集上优于其他主流神经网络模型,F1值比基线模型提高了1.21%~5.80%。
文摘The China Conference on Knowledge Graph and Semantic Computing(CCKS)2020 Evaluation Task 3 presented clinical named entity recognition and event extraction for the Chinese electronic medical records.Two annotated data sets and some other additional resources for these two subtasks were provided for participators.This evaluation competition attracted 354 teams and 46 of them successfully submitted the valid results.The pre-trained language models are widely applied in this evaluation task.Data argumentation and external resources are also helpful.
基金This work is supported in part by the Guangdong Science and Technology grant(No.2016A010101033)the Hong Kong and Macao joint research and development grant with Wuyi University(No.2019WGAH21).
文摘Medical named entity recognition(NER)is an area in which medical named entities are recognized from medical texts,such as diseases,drugs,surgery reports,anatomical parts,and examination documents.Conventional medical NER methods do not make full use of un-labelled medical texts embedded in medical documents.To address this issue,we proposed a medical NER approach based on pre-trained language models and a domain dictionary.First,we constructed a medical entity dictionary by extracting medical entities from labelled medical texts and collecting medical entities from other resources,such as the YiduN4 K data set.Second,we employed this dictionary to train domain-specific pre-trained language models using un-labelled medical texts.Third,we employed a pseudo labelling mechanism in un-labelled medical texts to automatically annotate texts and create pseudo labels.Fourth,the BiLSTM-CRF sequence tagging model was used to fine-tune the pre-trained language models.Our experiments on the un-labelled medical texts,which were extracted from Chinese electronic medical records,show that the proposed NER approach enables the strict and relaxed F1 scores to be 88.7%and 95.3%,respectively.