摘要
迁移学习是数据挖掘中的一个研究方向,试图重用相关领域的数据样本,将相关领域的知识"迁移"到新领域中帮助训练.当前,基于实例的迁移学习算法容易产生过度拟合的问题,不能充分利用相关领域中的有用数据.为了避免这个问题,通过引入目标领域的无标记样本参与训练,利用半监督Boosting方法,提出一种新的迁移学习算法,能够对样本的相关性进行更好的判断,减少选择性偏差的影响.在大量文本数据集上的实验表明了新算法的有效性.
Transfer learning aims at reusing existing instances from other related domains to help learning models for the target domain. Existing algorithms in instance-transfer learning might easily suffer from the problem of overfitting. To address this problem, we propose to incorporate additional unlabeled instances from the target domain, so that more domain knowledge can be brought into the training process. Specifically, under the generalized framework of boosting methods, we show that a semi-supervised boosting method can be applied to help re-weighting the source domain instances, making the final classifiers less sensitive to the small amount of labeled instances in the target domain. Extensive experiments confirm the efficiency of the new algorithm.
出处
《小型微型计算机系统》
CSCD
北大核心
2011年第11期2169-2173,共5页
Journal of Chinese Computer Systems
基金
广东科技计划项目(2008B050100040)资助