摘要
针对训练数据绝对不平衡问题,提出了一种基于级联结构的集成迁移学习算法。该算法主要包括两部分:迁移学习部分和数据选择部分。在迁移学习阶段,针对Tr Ada Boost算法中辅助领域样本权重不可恢复问题,引入权重恢复因子;在数据选择阶段,算法利用级联结构逐步删除辅助领域中噪声样本与冗余样本,在保证目标领域主导作用的同时充分利用辅助领域数据。在真实数据集上的实验结果表明,该算法在数据绝对不平衡的情况下,提升了分类器的综合评价指标与几何平均数,因此该算法可以在一定程度上解决数据绝对不平衡问题。
According to the problem of mining with absolute imbalanced data,this paper proposes an ensemble transfer learning algorithm based on cascade structure.The algorithm consists of two parts:the transfer learning and the data selection.At the transfer learning stage,to solve the problem that the weight of auxiliary domain data is irreversible in the Tr Ada Boost algorithm,the weight recovery factor is introduced.At the data selection stage,the algorithm gradually deletes the noise samples and redundant samples of the auxiliary domain at each node of cascade structure.The algorithm makes full use of the auxiliary domain data while ensuring the leading role of the target domain.The experimental results on the real data sets show that the algorithm has better effect on the F-measure value and G-mean value under the condition of absolute imbalanced data.Therefore,the proposed algorithm can solve the problem of absolute imbalance of training data to a certain extent.
作者
么素素
王宝亮
侯永宏
YAO Susu;WANG Baoliang;HOU Yonghong(School of Electrical and Information Engineering,Tianjin University,Tianjin 300072,China;Information and Network Center,Tianjin University,Tianjin 300072,China)
出处
《计算机科学与探索》
CSCD
北大核心
2018年第7期1145-1153,共9页
Journal of Frontiers of Computer Science and Technology
基金
国家自然科学基金No.61571325~~