期刊文献+

基于高斯混合模型的非平衡数据对称翻转算法 被引量:2

Symmetric Inverting Algorithm for Imbalanced Datasets Based on Gaussian Mixture Model
原文传递
导出
摘要 针对传统分类器对于非平衡数据的分类效果存在的问题,提出了一种基于高斯混合模型-期望最大化(GMM-EM)的对称翻转算法.该算法的核心思想是基于概率论中的"3σ法则"使数据达到平衡.首先,利用高斯混合模型和EM算法得到多数类与少数类数据的密度函数;其次,以少数类数据的均值为对称中心,根据"3σ法则"确定多数类侵入少数类的翻转边界,进行数据翻转,同时剔除与翻转区间中少数类原始数据数据重复的点;此时,若两类数据不平衡,则在翻转区域内使用概率密度增强方法使数据达到平衡.最后,从UCI、KEEL数据库中选取的14组数据使用决策树分类器对平衡后的数据进行分类,实例分析表明了该算法的有效性. Facing the unfavorable classification on imbalanced datasets,we propose a symmetric inverting algorithm based on Gaussian mixture model and expectation maximization(GMM-EM).The algorithm is used to balance the datasets based on the"3σrule"in probability theory.Firstly,we obtain the density functions of the minority class and majority class using GMM algorithm and EM algorithm.Secondly,we operate the symmetric transformation of minority class after obtaining the centers and the radius of the inverting region according to the"3σrule."After the inverting process,we eliminate the repetitive points of the original data of the minority class.At this moment,if the two types of data are imbalanced,the samples of the minority class are generated by using the probability density enhancing method.Finally,we apply our algorithm and other methods together with decision tree classifier for assessment.We choose 14 imbalanced datasets from UCI and KEEL repositories.Experimental results show that our algorithm is more effective than other methods.
作者 陈刚 王丽娟 CHEN Gang;WANG Lijuan(School of Science,Dalian Maritime University,Dalian 116026,China)
出处 《信息与控制》 CSCD 北大核心 2020年第2期203-209,218,共8页 Information and Control
基金 国家自然科学基金资助项目(11571056)。
关键词 非平衡数据 数据分类 对称翻转 GMM-EM算法 imbalanced dataset data classification symmetric inverting GMM-EM algorithm
  • 相关文献

参考文献5

二级参考文献29

  • 1施亮,钱雪忠.基于Hadoop的并行FP-Growth算法的研究与实现[J].微电子学与计算机,2015,32(4):150-154. 被引量:15
  • 2张琦,吴斌,王柏.非平衡数据训练方法概述[J].计算机科学,2005,32(10):181-186. 被引量:10
  • 3Vapnik V N. The Nature of Statical Learning Theory[M]. New York, USA: Spfinger-Verlag, 1995.
  • 4Musicant D, Kumar V, Ozgur A. Optimizing P-measure with Support Vector Machines[C]//Proceedings of the 16th International Florida Artificial Intelligence Research Society Conference. Florida, USA: AAAI Press, 2003: 356-360.
  • 5Morik K, Brockhausen P, Joachims T. Combining Statistical Learning with a Knowledge-based Approach A Case Study in Intensive Care Monitoring[C]//Proceedings of the International Conference on Machine Learning. San Diego, CA, USA: [s. n.], 1999.
  • 6He Haibo, Garcia E A.Leaming from imbalanced data[J]. IEEE Transactions on Knowledge and Data Engineering, 2009,21 (9) : 1263-1284.
  • 7Chawla N V,Bowyer K W,Hall L O,et al.SMOTE:Syn- thetic Minority Over-sampling Technique[J].Journal of Artificial Intelligence Research, 2002,16 : 321-357.
  • 8Han H,Wan W Y,Mao B H.Borderline-SMOTE:a new over-sampling method in imbalanced data sets learning[C]// LNCS 3644 : ICIC 2005,Part I, 2005 : 878-887.
  • 9He H,Bai Y, Garcia E A, et aI.ADASYN: adaptive syn- thetic sampling approach for imbalanced learning[C]//Proc of the International Joint Conference on Neural Networks, 2008 : 1322-1328.
  • 10Jo T, Japkowicz N.Class imbalances versus small dis- juncts[J].ACM SIGKDD Explorations Newsletter,2004,6 ( 1 ) : 40-49.

共引文献109

同被引文献22

引证文献2

二级引证文献7

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部