摘要
为准确抽取流行病学调查信息中的关键实体,构建了基于COVID-19确诊病例流调信息文本的命名实体语料集;提出了基于BERT预训练语言模型的流行病学调查流调信息的命名实体识别方法。该方法首先通过预训练语言模型BERT根据字的上下文动态生成语义向量作为模型输入,通过嵌入条件随机场(CRF)的双向长短时记忆(BiLSTM)神经网络模型获取输入文本序列的上下文特征,解码标注提取出相应的9个实体类型。为进一步提升实体识别效果,对模型进行改进,继续增加注意力层,实验结果显示,模型识别的F1值在94.23%的基础上又提升了1.16%。
In order to accurately extract key entities from epidemiological survey information and effectively assist the establishment of COVID-19 epidemiological survey research and service data center,a named entity corpus based on COVID-19 confirmed case flow survey information text is constructed;A named entity recognition method of epidemiological survey flow information based on Bert pre training language model is proposed.Firstly,the semantic vector is dynamically generated according to the word context by the pre training language model Bert and input as the model,then the context features of the input text sequence are obtained by the bi-directional long and short-term memory(bilstm)neural network model embedded with conditional random field(CRF),and the corresponding 9 entity types are extracted by decoding and labeling.In order to further improve the effect of entity recognition,the model is improved and the attention mechanism is continuously increased.The experimental results show that the F1 value of model recognition is increased by 1.16%on the basis of 94.23%.
作者
徐美仙
谢晓尧
郑欣
XU Meixian;XIE Xiaoyao;ZHENG Xin(Key Laboratory of information and Computing Science of Guizhou Province,Guizhou Normal University,Guiyang,Guizhou 550001,China)
出处
《贵州师范大学学报(自然科学版)》
CAS
2022年第3期73-81,共9页
Journal of Guizhou Normal University:Natural Sciences
基金
国家自然科学基金资助项目(61461009)。
关键词
流行病学调查信息
命名实体识别
字向量
BERT
注意力机制
epidemiological survey information
named entity recognition
word vector
BERT
attention mechanism