期刊文献+

基于簇内样本平均分类错误率的混合采样算法 被引量:3

Hybrid Sampling Algorithm Based on Classification Error Rate of Samples in Cluster
下载PDF
导出
摘要 针对类别不平衡的数据分类效果差的问题,本文提出了一种基于簇内样本平均分类错误率的混合采样算法(SABER),该算法首先对少数类使用SM OTE算法增加样本数量,然后添加各类别的部分样本至平衡样本集中,并用平衡样本集训练一个初始的分类器,然后进行多轮迭代,在每一轮迭代中执行:采用K-means算法对多数类剩余的还未用于训练分类器的样本进行聚类,根据分类器对各个簇的簇内样本平均分类错误率,提取出平均分类错误率最大的前几个簇各自的代表点,将其添加至平衡样本集中,同时不放回地随机提取与平衡样本集中新增的多数类样本数量基本相同的少数类样本,并将其添加至平衡样本集中,用平衡样本集重新训练分类器.实验结果表明,SABER算法可以提高对少数类样本的分类性能以及总体的分类性能. Aiming at the problem of poor classification performance of datasets with imbalanced class distribution,A hybrid Sampling Algorithm Based on classification Error Rate of samples in cluster(SABER)is proposed.SMOTE algorithm is used to increase the samples of minority classes.Adding partial samples of each classes to the balanced sample set,and using the balanced sample set to train an initial classifier,then multiple iterations are performed.In each iteration:K-means clustering is applied on the remaining samples of each majority class that have not been used for training classifier;the average classification error rate of samples in each cluster is calculated,and representative samples of certain clusters with the larger average classification error rate are extracted without putting back,and they are added to the balanced sample set;at the same time,extracting minority samples which amount are approximately equal to the number of majority samples added to the balanced sample set in this iteration,then the classifier is retrained with the balanced sample set.Experimental results show that the SABER algorithm can improve the classification performance of minority class and the overall classification performance.
作者 熊炫睿 陈高升 熊炼 张媛 程占伟 付明凯 XIONG Xuan-rui;CHEN Gao-sheng;XIONG Lian;ZHANG Yuan;CHENG Zhan-wei;FU Ming-kai(College of Communication and Information Engineering,Chongqing University of Posts and Telecommunications,Chongqing 400065,China;Chongqing Institute of Engineering,Chongqing 400056,China)
出处 《小型微型计算机系统》 CSCD 北大核心 2021年第8期1683-1687,共5页 Journal of Chinese Computer Systems
基金 重庆市基础科学与前沿技术研究专项项目(cstc2017jcyj AX0135,cstc2020jcyj-msxm X0636)资助 重庆市教育委员会科学技术研究项目(KJQN201801908)资助 重庆邮电大学科研启动基金项目(A2015-14)资助。
关键词 类别不平衡 混合采样 K-MEANS算法 SMOTE算法 class imbalance problem hybrid sampling K-means algorithm SMOTE algorithm
  • 相关文献

参考文献8

二级参考文献51

  • 1林舒杨,李翠华,江弋,林琛,邹权.不平衡数据的降采样方法研究[J].计算机研究与发展,2011,48(S3):47-53. 被引量:31
  • 2肖春景,张敏.基于减法聚类与模糊c-均值的模糊聚类的研究[J].计算机工程,2005,31(B07):135-137. 被引量:22
  • 3徐章艳,刘作鹏,杨炳儒,宋威.一个复杂度为max(O(|C||U|),O(|C^2|U/C|))的快速属性约简算法[J].计算机学报,2006,29(3):391-399. 被引量:234
  • 4HanJiawei MichelineKambe.数据挖掘概念与技术[M].北京:机械工业出版社,2001..
  • 5Andrew P. Bradley.The use of the area under the ROC curve in the evaluation of machine learning algorithms[J].Pattern Recognition.1997(7)
  • 6Yoav Freund,Robert E Schapire.A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting[J].Journal of Computer and System Sciences.1997(1)
  • 7CHAWLA N V, BOWYER K B, HALL L Q, et al. SMOTE : Synthetic minority over -sampling technique [J]. Journal of Artificial Intelligence Research, 2002 (16) : 321- 357.
  • 8KUBAT M, MATWIN S. Addressing the curse of imbal- anced training sets: one-sided selection [C]. Proceedings of the 14th International Conference on Machine Learning, San Francisco, 1997:179-186.
  • 9VAPNIK V. The nature of statistical learning theory[M].New York: Springer-Verlag, 2000.
  • 10HUANG G B, ZHOU H, DING X, et al. Extreme learning machine for regression and muhiclass classification[J]. IEEE Trans. Syst. Man Cybem, 2012,42(2):513-529.

共引文献66

同被引文献19

引证文献3

二级引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部