期刊文献+

基于概率分布估计的混合采样算法 被引量:6

Hybrid sampling algorithm based on probability distribution estimation
原文传递
导出
摘要 在类别不均衡的数据中,类间和类内不均衡性问题都是导致分类性能下降的重要因素.为了提高不均衡数据集下分类算法的性能,提出一种基于概率分布估计的混合采样算法.该算法依据数据概率分别对每个子类进行采样以保证类内的均衡性;并扩大少数类的潜在决策域和减少多数类的冗余信息,从而同时从全局和局部两个角度改善数据的平衡性.实验结果表明,该算法提高了传统分类算法在不均衡数据下的分类性能. In the class imbalanced data distribution, both the between-class and within-class imbalance issues are critical factors to decrease the performance. To improve the performance of classifier algorithm on the imbalanced data, a hybrid sampling algorithm based on probability distribution estimation is proposed. The approach re-samples the data of subclass to balance the distribution in each class based on probability distribution estimation. Moreover, it expands the decision region of minority class and removes the redundant information of majority class, so as to solve the imbalance issues from both global and local perspectives simultaneously. Experimental results show that the proposed method improves the classification performance for imbalanced data.
出处 《控制与决策》 EI CSCD 北大核心 2014年第5期815-820,共6页 Control and Decision
基金 国家自然科学基金项目(61001047) 中央高校基本科研业务费专项资金项目(N110618001)
关键词 不均衡数据学习 类内不均衡 混合采样 概率分布估计 imbalanced data learning within-class imbalance hybrid sampling probability distribution estimation
  • 相关文献

参考文献18

  • 1He H, Garcia E A. Learning from imbalanced data[J]. IEEE Trans on Knowledge and Data Engineering, 2009, 21(9): 1263-1284.
  • 2陶新民,张冬雪,郝思媛,付丹丹.基于谱聚类欠取样的不均衡数据SVM分类算法[J].控制与决策,2012,27(12):1761-1768. 被引量:28
  • 3Chawla N V, Bowyer K W, Hall L O, et al. SMOTE: Synthetic minority over-sampling technique[J]. J of Artificial Intelligence Research, 2002, 6(1): 321-357.
  • 4Cao P, Zhao D, Zaiane O. An optimized cost-sensitive SVM for imbalanced data learning[C]. Proc of the 17th Pacific-Asia Conf on Knowledge Discovery and Data Mining. Gold Coast, 2013: 280-292.
  • 5陈刚,冯丹.一种新的模糊规则权重方法的非平衡数据分类问题的研究[J].控制与决策,2012,27(1):104-108. 被引量:5
  • 6Weiss G. The impact of small disjuncts on classifier learning[J]. Annals of Information Systems, 2010, 8(1): 193-226.
  • 7Jo T, Japkowicz N. Class imbalances versus small disjuncts[J]. ACM SIGKDD Explorations Newsletter, 2004, 6(1): 40-49.
  • 8Japkowicz N. Concept-learning in the presence of between- class and within-class imbalances[C]. Proc of Advances in Artificial Intelligence. Adelaide, 2001: 67-77.
  • 9Titterington D M, Smith A F M, Makov U E. Statistical analysis of finite mixture distributions[M]. New York: John Wiley Sons, 2001.
  • 10Laurikkala J. Improving identification of difficult small classes by balancing class distribution[C]. Proc of AI in Medicine in Europe: Artificial Intelligence Medicine. Cascais, 2001: 63-66.

二级参考文献18

  • 1王玲,薄列峰,焦李成.密度敏感的半监督谱聚类[J].软件学报,2007,18(10):2412-2422. 被引量:95
  • 2Vapnik V N. The nature of statistical learning theory[M]. New York: Springer, 2000: 138-167.
  • 3He H B, Edwardo A. Learning from imbalanced data[J]. IEEE Trans on Knowledge and Data Engineering, 2009, 21(8): 1263-1284.
  • 4Liu X Y, Zhou Z H. Exploratory under-sampling for class- imbalance learing[J]. IEEE Trans on Systems, Man and Cybernetics, 2009, 39(2): 539-550.
  • 5Liu X Y, Zhou Z H. Training cost-sensitive neural networks with methods addressing the class imbalance problem[J]. IEEE Trans on Knowledage and Data Engineering, 2006, 18(1): 63-77.
  • 6Van H J, Khoshgoftaar T M, Napolitano A. Experimental perspectives on learning from imbalanceed data[C]. Proc of the 24th Int Conf on Machine Learning. New York: ACM, 2007: 143-146.
  • 7Weiss G M. Mining with rarity: A unifying framework[J]. ACM SIGKDD Explorations Newsletter, 2004, 6(1): 7-19.
  • 8Estabrooks A, Jo T. A mul6ple resampling method for learning from imbalanced data sets[J]. Computational Intelligence, 2004, 20(11): 18- 36.
  • 9Han H, Wang W Y, Mao B H. Borderline-SMOTE: A new over-sampling method in imbalanced data sets learning[C]. Proc of Int Conf on Intelligent Computing. Hefei, 2005: 878-887.
  • 10Akban I R, Kwek S, Japkow I. Applying support vector machines to imbalanced datasets[C]. Proc of the 15th European Conf on Machines Learning. Berlin Heidelberg: Spring-Verlag, 2004: 39-50.

共引文献30

同被引文献19

引证文献6

二级引证文献45

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部