摘要
该文提出了一种基于感知器的中文分词增量训练方法。该方法可在训练好的模型基础上添加目标领域标注数据继续训练,解决了大规模切分数据难于共享,源领域与目标领域数据混合需要重新训练等问题。实验表明,增量训练可以有效提升领域适应性,达到与传统数据混合相类似的效果。同时该文方法模型占用空间小,训练时间短,可以快速训练获得目标领域的模型。
In this paper, we propose an incremental learning scheme for perceptron based Chinese word segmentation. Our method can perform continuous training over a fine tuned source domain model, enabling to deliver model without annotated data and re-training. Experimental results shows the scheme proposed can significantly improve adaptation performance on Chinese word segmentation and achieve comparable performance with traditional method. At the same time, our method can significantly reduce the model size and the training time.
出处
《中文信息学报》
CSCD
北大核心
2015年第5期49-54,共6页
Journal of Chinese Information Processing
关键词
中文分词
领域适应
增量训练
Chinese word segmentation
domain adaptation
incremental learning