摘要
中文分词是自然语言处理的常见任务之一。在跨领域分词任务中,目标领域的数据分布不同及数据量不足通常导致分词效果急剧下降。基于该问题,本文通过引入了迁移学习、对抗学习和正交约束以减轻共享和私有特征之间的干扰,提出了一种基于特征迁移的跨领域中文分词模型,能够在跨领域和小数据量条件下,借鉴数据量较大的源领域的知识来进行学习,实验证明该模型最终获得了出色的表现。
Chinese word segmentation is one of the common tasks in natural language processing.In cross-domain Chinese word segmentation tasks,the different distributions between two different domains and the lack of enough training data often result the low performance.For this problem,we propose a cross-domain Chinese word segmentation model based on feature transfer,which introduces transfer learning,adversarial learning and orthogonal constraints to reduce the interferences between shared and private features.This model can learn from the knowledge of source domain with large amount of data under the premise of small amount of data and cross-domain.Experimental results show that the scheme achieves excellent performance.
作者
张韬政
张家健
ZHANG Taozheng;ZHANG Jiajian(School of information and communication engineering,Communication University of China,Beijing 100024,China)
出处
《中国传媒大学学报(自然科学版)》
2021年第3期41-45,74,共6页
Journal of Communication University of China:Science and Technology
基金
中国传媒大学中央高校基本科研业务费专项资金资助(3132018XNG1829)。
关键词
迁移学习
对抗学习
正交约束
中文分词
transfer learning
adversarial learning
orthogonal constraints
Chinese word segmentation