期刊文献+

融合相似度负采样的远程监督命名实体识别方法

Incorporating similarity negative sampling for distantly supervised NER
下载PDF
导出
摘要 实体漏标是目前远程监督命名实体识别(distantly supervised named entity recognition,DS-NER)存在的一个难点问题。训练集中的漏标实体在模型训练中提供了不正确的监督信息,模型将在后续预测实体类型时更倾向于将该类实体预测为非实体,导致模型的实体识别和分类能力下降,同时影响了模型的泛化性能。针对这一问题,提出了融合实体特征相似度计算负采样命名实体识别方法。首先,通过对候选样本和标注实体样本进行相似度计算并打分;其次,以相似度得分作为依据对候选样本进行采样,采样出参与训练的样本。与随机负采样方法相比,该方法通过结合相似度计算,降低了采样到漏标实体的可能性,进而提高了训练数据的质量,从而提升了模型的性能。实验结果表明,该方法在CoNLL03、Wiki、Twitter三个数据集上与其他模型相比,比基线模型平均取得了5%左右的F_(1)值提升,证明了该方法能够有效缓解远程监督条件下实体漏标带来的命名实体识别模型性能下降的问题。 The entity omission is a typical problem of distantly supervised named entity recognition.Entity omission in the training set provides incorrect supervision information during model training,model will be more inclined to predict this type of entity as a non-entity when subsequently predicting entity types,resulting in a decline in the model’s entity recognition and classification capabilities,and affects the generalization performance of the model.To deal with the problem,this paper proposed a incorporating similarity negative sampling for distantly supervised named entity recognition.Firstly,it calculated and scored the similarity between the candidate samples and the labeled entity samples.Secondly,it sampled the candidate samples based on the similarity score,and sampled the samples participating in the training.Compared with the random negative sampling method,this method reduced the possibility of sampling missing entities by combining similarity calculations,thereby improving the quality of training data and thus improving the performance of the model.Experimental results show that compared with other models on the three data sets of CoNLL03,Wiki,and Twitter,compared with the baseline model,the proposed model achieved an average F_(1) value improvement of about 5 percentage points.It is proved that this method can effectively alleviate the problem of performance degradation of the named entity recognition model caused by missing entities under distantly supervised conditions.
作者 刘杨 线岩团 相艳 黄于欣 Liu Yang;Xian Yantuan;Xiang Yan;Huang Yuxin(Faculty of Information Engineering&Automation,Kunming University of Science&Technology,Kunming 650500,China;Yunnan Key Laboratory of Artificial Intelligence,Kunming 650500,China)
出处 《计算机应用研究》 CSCD 北大核心 2024年第8期2322-2328,共7页 Application Research of Computers
基金 国家自然科学基金资助项目(62266028) 云南重大科技专项计划课题(202202AD080003)。
关键词 命名实体识别 实体漏标 远程监督 负采样 数据增强 named entity recognition entity omission distantly supervised negative sampling data augmentation
  • 相关文献

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部