With the rapid development of information technology,the electronifi-cation of medical records has gradually become a trend.In China,the population base is huge and the supporting medical institutions are numerous,so ...With the rapid development of information technology,the electronifi-cation of medical records has gradually become a trend.In China,the population base is huge and the supporting medical institutions are numerous,so this reality drives the conversion of paper medical records to electronic medical records.Electronic medical records are the basis for establishing a smart hospital and an important guarantee for achieving medical intelligence,and the massive amount of electronic medical record data is also an important data set for conducting research in the medical field.However,electronic medical records contain a large amount of private patient information,which must be desensitized before they are used as open resources.Therefore,to solve the above problems,data masking for Chinese electronic medical records with named entity recognition is proposed in this paper.Firstly,the text is vectorized to satisfy the required format of the model input.Secondly,since the input sentences may have a long or short length and the relationship between sentences in context is not negligible.To this end,a neural network model for named entity recognition based on bidirectional long short-term memory(BiLSTM)with conditional random fields(CRF)is constructed.Finally,the data masking operation is performed based on the named entity recog-nition results,mainly using regular expression filtering encryption and principal component analysis(PCA)word vector compression and replacement.In addi-tion,comparison experiments with the hidden markov model(HMM)model,LSTM-CRF model,and BiLSTM model are conducted in this paper.The experi-mental results show that the method used in this paper achieves 92.72%Accuracy,92.30%Recall,and 92.51%F1_score,which has higher accuracy compared with other models.展开更多
基金This research was supported by the National Natural Science Foundation of China under Grant(No.42050102)the Postgraduate Education Reform Project of Jiangsu Province under Grant(No.SJCX22_0343)Also,this research was supported by Dou Wanchun Expert Workstation of Yunnan Province(No.202205AF150013).
文摘With the rapid development of information technology,the electronifi-cation of medical records has gradually become a trend.In China,the population base is huge and the supporting medical institutions are numerous,so this reality drives the conversion of paper medical records to electronic medical records.Electronic medical records are the basis for establishing a smart hospital and an important guarantee for achieving medical intelligence,and the massive amount of electronic medical record data is also an important data set for conducting research in the medical field.However,electronic medical records contain a large amount of private patient information,which must be desensitized before they are used as open resources.Therefore,to solve the above problems,data masking for Chinese electronic medical records with named entity recognition is proposed in this paper.Firstly,the text is vectorized to satisfy the required format of the model input.Secondly,since the input sentences may have a long or short length and the relationship between sentences in context is not negligible.To this end,a neural network model for named entity recognition based on bidirectional long short-term memory(BiLSTM)with conditional random fields(CRF)is constructed.Finally,the data masking operation is performed based on the named entity recog-nition results,mainly using regular expression filtering encryption and principal component analysis(PCA)word vector compression and replacement.In addi-tion,comparison experiments with the hidden markov model(HMM)model,LSTM-CRF model,and BiLSTM model are conducted in this paper.The experi-mental results show that the method used in this paper achieves 92.72%Accuracy,92.30%Recall,and 92.51%F1_score,which has higher accuracy compared with other models.