期刊文献+

绝对不平衡样本分类的集成迁移学习算法 被引量:9

Ensemble Transfer Learning Algorithm for Absolute Imbalanced Data Classification
下载PDF
导出
摘要 针对训练数据绝对不平衡问题,提出了一种基于级联结构的集成迁移学习算法。该算法主要包括两部分:迁移学习部分和数据选择部分。在迁移学习阶段,针对Tr Ada Boost算法中辅助领域样本权重不可恢复问题,引入权重恢复因子;在数据选择阶段,算法利用级联结构逐步删除辅助领域中噪声样本与冗余样本,在保证目标领域主导作用的同时充分利用辅助领域数据。在真实数据集上的实验结果表明,该算法在数据绝对不平衡的情况下,提升了分类器的综合评价指标与几何平均数,因此该算法可以在一定程度上解决数据绝对不平衡问题。 According to the problem of mining with absolute imbalanced data,this paper proposes an ensemble transfer learning algorithm based on cascade structure.The algorithm consists of two parts:the transfer learning and the data selection.At the transfer learning stage,to solve the problem that the weight of auxiliary domain data is irreversible in the Tr Ada Boost algorithm,the weight recovery factor is introduced.At the data selection stage,the algorithm gradually deletes the noise samples and redundant samples of the auxiliary domain at each node of cascade structure.The algorithm makes full use of the auxiliary domain data while ensuring the leading role of the target domain.The experimental results on the real data sets show that the algorithm has better effect on the F-measure value and G-mean value under the condition of absolute imbalanced data.Therefore,the proposed algorithm can solve the problem of absolute imbalance of training data to a certain extent.
作者 么素素 王宝亮 侯永宏 YAO Susu;WANG Baoliang;HOU Yonghong(School of Electrical and Information Engineering,Tianjin University,Tianjin 300072,China;Information and Network Center,Tianjin University,Tianjin 300072,China)
出处 《计算机科学与探索》 CSCD 北大核心 2018年第7期1145-1153,共9页 Journal of Frontiers of Computer Science and Technology
基金 国家自然科学基金No.61571325~~
关键词 集成迁移学习 级联模型 不平衡数据 TrAdaBoost ensemble transfer learning cascade module imbalanced data TrAdaBoost
  • 相关文献

参考文献2

二级参考文献105

  • 1Pearson R, Coney G, Shwaber J. Imbalanced clustering for microarray time-series. Proceedings of the ICML' 03 Workshop on Learning from Imbalaneed Data Sets. Washington, DC,2003.
  • 2Wu G, Chang E Y. Class-boundary alignment for imbalanced dataset learning. Proceedings of the ICML' 03 Workshop on Learning from Imbalanced Data Sets. Washington, DC, 2003.
  • 3Chan P K, Stolfo S J. Toward scalable learning with nonuniform class and cost distributions. A case study in credit card fraud detection. Proceedings of the 4th International Conference on Knowledge Discovery and Data Mining. NY:AAAI Press, 1998, 164-168,
  • 4Kubat M, Holte R C, Matwin S. Machine learning for the detection of oil spills in satellite radar images. Machine Learning, 1998, 30(2) :195-215.
  • 5Van den Bosch A, Weijters T, van den Herik H J, et al. When small disjuncts abound, try lazy learning: A case study. Proceedings of the 7th Belgian-Dutch Conference on Machine Learning.Tilburg: Tilburg University Press, 1997, 109- 118.
  • 6Lewis D, Catlett J. Heterogeneous uncertainty sampling for supervised learning. Proceedings for the llth International Conference of Machine Learning. New Brunswick, NJ: Morgan Kaufmann Press, 1994, 148-156.
  • 7Fawcett T. "In vivo" spam filtering: A challenge problem for data mining. SIGKDD Explorations, 2003, 5(2). 140-148.
  • 8Weiss G. Mining with rarity: A unifying framework. SIGKDD Explorations, 2004, 6 (1) : 7- 19.
  • 9Viola P, Jones M. Rapid object detection using a boosted cascade of simple features. Proceedings of 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.Kauai, HI, USA, 2001: 511-518.
  • 10Freund Y, Schapire R E. A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 1997, 55(1):119-139.

共引文献489

同被引文献78

引证文献9

二级引证文献32

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部