摘要
针对现有的实体识别方法未考虑教育领域术语对模型识别性能的影响,导致模型性能不佳以及知识实体边界模糊问题,提出了一种基于字符注意力与词典特征的教育领域实体识别方法。该方法首先通过BERT预处理语言模型根据上下文语义信息生成字向量,提出基于词性的字符注意力机制重新分配句子中字的权重。然后与构建的教育领域词典特征拼接融合,将其输入到BiLSTM网络与IDCNN网络提取特征,通过注意力机制将两层的输出动态组合,对两层的输出进行加权,从而融合新的特征。最后通过条件随机场进行计算,得到实体对应的标签序列。与现有方法相比,该方法在教育学科领域文本库中获得了更高的精度,识别结果的准确率、召回率、F1值分别为90.71%,91.37%,91.04%。
Aiming at the problem that the existing entity recognition methods do not consider the influence of education terms on the model recognition performance,which leads to poor model performance and fuzzy knowledge entity boundary,a new entity recognition method based on character attention and dictionary feature is proposed.In this method,word vectors are generated according to contextual semantic information through BERT preprocessing language model,and a character attention mechanism based on part of speech is proposed to redistribute the weight of words in sentences.Then,it is spliced and fused with the features of the educational field dictionary constructed,and input into BiLSTM network and IDCNN network to extract features.The output of the two layers is dynamically combined through the attention mechanism,and the output of the two layers is weighted to fuse new features.Finally,the label sequence corresponding to the entity is obtained through conditional random field calculation.Compared to existing methods,the proposed method achieves higher accuracy in an educational domain text corpus.The precision,recall,and F1 score of the recognition results are 90.71%,91.37%,and 91.04%,respectively.
作者
王萌
刘春刚
赵华
WANG Meng;LIU Chun-gang;ZHAO Hua(School of Vocational Technology and Combustion Engineering,Hebei Normal University,Shijiazhuang 050024,China;Hebei Key Laboratory of Information Fusion and Intelligent Control,Shijiazhuang 050024,China)
出处
《计算机技术与发展》
2024年第7期168-174,共7页
Computer Technology and Development
基金
国家自然科学基金(62071167)。
关键词
实体识别
词典特征
字符注意力
IDCNN
条件随机场
entity recognition
dictionary feature
character attention
IDCNN
conditional random field