摘要
古诗词地名实体识别不仅有助于深度挖掘古诗词文本之间的关联,而且有助于绘制中国诗歌版图分布,推动空间维度的中国古典文学研究。文章围绕南京城系统采集有关古诗词数据,采用BIOES方法进行地名实体标注。针对古诗词领域训练数据匮乏、以字代词等问题,提出一种采用数据增强方法,同时融合预训练模型与条件随机场方法的古诗词地名识别模型,简称DABERT-CRF模型。文章将训练数据采用实体交叉互换方法进行数据增强处理,然后通过预训练模型BERT得到古诗词地名的上下文语义信息,最后利用条件随机场CRF实现地名标签约束并生成全局最优地名序列。文章提出的DA-BERT-CRF模型十折交叉实验平均精确率、平均召回率和平均F值分别为86.49%、90.44%、88.35%。
The entity recognition of the place in ancient poetry not only helps to deeply explore the relationship between ancient poems,but also helps to draw the distribution of Chinese poetry and promote the study of Chinese classical literature in spatial dimension.The paper collected the data about the ancient poetry of Nanjing and marked the place names with BIOES.Aiming at the lack of training data in the field of ancient poetry,the paper proposed a place name recognition model in ancient poetry,which used a data augmentation method and combined the pre-training model and CRF model,called DABERT-CRF model.In this paper,the training data was enhanced by the entity cross-exchange method.The context semantic information of the place names in ancient poetry was obtained by BERT model.Then,the CRF model was used to realize the constraint of the place name label and to generate the global optimal place name sequence.The average accuracy,average recall and average F value of the DA-BERTCRF model presented in this paper were 86.49%,90.44%and 88.35%respectively.
作者
余馨玲
常娥
Yu Xinling;Chang E(School of Economics and Management,Southeast University;Southeast University Library)
出处
《图书馆杂志》
CSSCI
北大核心
2023年第10期87-94,73,共9页
Library Journal
关键词
深度学习模型
地名实体识别
古诗词
数据增强
Deep learning model
Named entity recognition of the place
Ancient poetry
Data augmentation