期刊文献+

基于样本贡献度对抗迁移的审计领域细粒度实体识别模型

Fine-grained Entity Recognition Model in Audit Domain Based on Adversarial Migration of Sample Contributions
下载PDF
导出
摘要 细粒度命名实体识别(Named Entity Recognition,NER)在审计领域扶贫文本中识别实体信息,对优化扶贫政策成效分析与评估至关重要。近年来,深度学习在细粒度NER任务中取得显著成效,但特定领域仍面临语料集匮乏、迁移学习中细粒度特征不兼容性加剧及数据不平衡等问题。针对这些问题,制定了细粒度扶贫审计实体标签体系,并构建了细粒度扶贫审计语料集(FG-PAudit-Corpus)以解决审计领域数据集匮乏的问题。提出了基于样本贡献度对抗迁移的细粒度实体识别模型(FGATSC),该模型做对抗迁移训练,提出将样本贡献度权重纳入迁移特征中以解决细粒度特征的不兼容问题。同时,针对源域高资源与扶贫审计领域低资源样本的不平衡,提出了平衡资源对抗鉴别器(BRAD)以降低这种影响。实验结果表明,FGATSC模型在FG-PAudit-Corpus上F1的值为75.83%,较基线模型提高了9.03%,较其他主流模型提升了4.01%~6.53%;在Resume数据集上进行泛化性验证,F1值较近几年的主流模型提高约0.14%~1.31%,达到了95.77%。综上,验证了FGATSC模型的有效性和泛化性。 Fine-grained named entity recognition(NER)identifies entity information in pro-poor texts in the auditing domain,which is crucial for optimising the analysis and evaluation of pro-poor policy effectiveness.In recent years,deep learning has achieved significant results in fine-grained NER tasks,but the specific domain still faces problems such as the lack of corpus set,the increasing incompatibility of fine-grained features in transfer learning,and data imbalance.To address these issues,we formulate a fine-grained pro-poor audit entity labelling system and construct a fine-grained pro-poor audit corpus(FG-PAudit-Corpus)to address the scarcity of datasets in the audit domain.A fine-grained entity recognition model(FGATSC)based on sample contribution against migration is proposed,which does the training against migration and proposes to incorporate the sample contribution weights into the migrated features to solve the incompatibility problem of fine-grained features.Meanwhile,for the imbalance between high resources in the source domain and low resource samples in the pro-poor audit domain,balanced resource adversarial discriminator(BRAD)is proposed to reduce this effect.Experimental results show that the F1 value of the FGATSC model on FG-PAudit-Corpus is 75.83%,which is improved by 9.03% compared with the baseline model,and 4.01% to 6.53%compared with the other mainstream models.For the generalisation validation on the Resume dataset,the F1 is improved by about 0.14% to 1.31% compared with the mainstream models in recent years,and reaches 95.77%.In summary,the validity and generali-zability of the FGATSC model are verified.
作者 庞博文 陈一飞 黄佳 PANG Bowen;CHEN Yifei;HUANG Jia(School of Computer Science,Nanjing Audit University,Nanjing 211815,China)
出处 《计算机科学》 CSCD 北大核心 2024年第S02期136-143,共8页 Computer Science
关键词 细粒度实体识别 扶贫审计 对抗训练 样本贡献度 平衡资源 Fine-grained entity recognition Pro-poor auditing Adversarial training Sample contribution Balancing resources
  • 相关文献

参考文献10

二级参考文献54

共引文献50

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部