摘要
领域自适应是解决低资源问题的一种通用方式,可应用于各种自然语言处理的任务中.当前针对命名实体识别(named entity recognition,NER)任务的领域自适应研究通常从单一的源领域迁移到目标领域,在目标领域和源领域相近的情况下,这种方式能够取得较好的识别效果,但是在目标领域与源领域相关度不高的情况下,单一领域迁移方式存在很大的局限性.针对这一问题,提出一种融合多源领域贡献度加权的自适应NER模型(multi-domain adaptation NER model based on importance weighting,MDAIW).1)通过多个领域的知识迁移来提升目标领域的实体识别性能;2)根据不同领域及其领域内样本对目标领域的重要性,计算领域贡献度;3)将领域贡献度引入到NER模型中,以此来实现更好的模型领域适应性.最终在多个目标领域上进行实验,性能皆优于当前性能最好的方法,验证了模型的有效性.
As a general way to solve problems of lacking training data,domain adaptation can be applied to various types of natural language processing tasks.At present,the research of domain adaptation for named entity recognition(NER)usually follows the setting adapting from single source domain to single target domain.Though it can achieve certain results for the target domain which is closely related to the source domain,it endures great limitations for that which is not.To solve this problem,we propose a multi-source domain adaptation named entity recognition model based on importance weighting(MDAIW).MDAIW 1)improves the performance of the target domain through the knowledge of multiple domains,2)calculate the domain contribution according to the importance of the target domain according to different domains and their samples,and 3)the domain contribution is imbedded into the named entity recognition model to achieve more satisfactory model adaptabilities.Experiments on several target domains show that the model outperforms the state-of-the-art method,thus validating the effectiveness of the model.
作者
李佳芮
刘健
陈钰枫
徐金安
张玉洁
LI Jiarui;LIU Jian;CHEN Yufeng;XU Jin an;ZHANG Yujie(School of Computer and Information Technology,Beijing Jiaotong University,Beijing 100044,China)
出处
《厦门大学学报(自然科学版)》
CAS
CSCD
北大核心
2022年第4期617-623,共7页
Journal of Xiamen University:Natural Science
基金
国家自然科学基金(61976016,61976015,61876198)
国家重点研发计划(2019YFB1405200)。
关键词
命名实体识别
领域自适应
贡献度加权
多源
named entity recognition
domain adaptation
importance weighting
multi-source