摘要
针对防震减灾命名实体识别任务中存在的特征信息不足且识别效率低的问题,提出了一种“融合自注意力与MarkBERT”的防震减灾领域实体识别模型。该模型在预训练过程中引入MarkBERT:(1)得到含有词边界信息的序列;(2)利用BiLSTM获取字符位置信息;(3)引入自注意力机制进一步捕获序列内部关系并分配特征权重;(4)通过条件随机场输出最优序列标注结果。本模型基于“地震防治相关问句BIO标注数据”进行了实验,结果显示F_(1)值达到了96.18%,并与3组同类模型进行对比,验证了算法的优越性。实验结果表明,该模型能高效准确的识别文本中的防震减灾实体。
In response to the problem of insufficient feature information and low recognition efficiency in the task of naming entities for earthquake prevention and disaster reduction,this study proposes a method for entity recognition in the field of earthquake prevention and disaster reduction that integrates Self-Attention and MarkBERT.Using MarkBERT to introduce word boundary information during the pre-training process,a sequence containing boundary information is obtained;Obtain character position information through BiLSTM;Introducing a Self-Attention mechanism to further capture the internal relationships of sequences and allocate feature weights;Finally,the optimal sequence annotation result is output through conditional random fields.This model was tested based on the“BIO annotation data of earthquake prevention and control related questions”,and the F_(1)value reached 96.18%.And the superiority of the algorithm was verified by comparing three sets of similar models.The experimental results show that the model can efficiently and accurately identify earthquake prevention and disaster reduction entities in text.
作者
徐婧
刘纪平
王亮
王岩
XU Jing;LIU Jiping;WANG Liang;WANG Yan(Faculty of Geomatics,Lanzhou Jiaotong University,Lanzhou 730070,China;Chinese Academy of Surveying and Mapping,Beijing 100036,China;National-Local Joint Engineering Research Center of Technologies and Applications for National Geographic State Monitoring,Lanzhou 730070,China;Gansu Provincial Engineering Laboratory for National Geographic State Monitoring,Lanzhou 730070,China;School of Geomatics,Liaoning Technical University,Fuxin,Liaoning 123000,China)
出处
《测绘科学》
CSCD
北大核心
2024年第1期216-224,共9页
Science of Surveying and Mapping
基金
国家重点研发计划项目(2022YFC3003604)