期刊文献+

面向不平衡数据分类的最近邻三角区域合成少数类过采样技术 被引量:4

The Nearest Neighbor Triangle Regions Synthetic Minority Oversampling Technique for Imbalanced Data Classification
下载PDF
导出
摘要 针对传统的合成少数类过采样技术(synthetic minority oversampling technique,SMOTE)在类别区域重合的数据集应用时,可能产生多个更接近多数类的人工样例,甚至突破类别边界,从而影响整体分类性能的情况,提出了一种最近三角区域的SMOTE方法,使合成的人工样例只出现在少数类样例的最近三角区域内部,并且删除掉距离多数类更近的合成样例,从而使生成的样例更接近少数类,且不突破原始的类别边界。实验分别在人工数据集和改进的UCI数据集上进行,并和原始的SMOTE方法分别在G-mean和F-value的评价指标上进行了对比。实验结果验证了改进的SMOTE方法在类别区域有重合的数据集上要优于原始SMOTE方法。 When the traditional synthetic minority oversampling technique(SMOTE)is applied to the imbalanced data that has different classes overlap region,it is possible to generate a number of artificial samples,which are more close to the majority class,even to break through the class boundaries,thus affecting the overall classification performance.A new improved SMOTE is proposed,which generates an artificial sample in the nearest neighbor triangular regions of the minority class sample,and deletes the artificial samples which are more close to the majority class.So the new method ensures that the artificial samples are more close to the minority class without breaking the original class boundaries.The method is implemented on the artificial data sets and the UCI data sets.It is compared with the original SMOTE method on the evaluation indexes of G-mean and F-value respectively.The experimental results also verify that the improved SMOTE method is better than the original SMOTE method to handle with the imbalanced data has different classes overlap region.
作者 刘丹 王晓兰 邢胜 LIU Dan;WANG Xiao-lan;XING Sheng(College of Computer Science and Engineering,Cangzhou Normal University;Department of Information Engineering,Cangzhou Technical College,Cangzhou 061001,China)
出处 《科学技术与工程》 北大核心 2018年第28期215-219,共5页 Science Technology and Engineering
基金 国家自然科学基金(71371063 61170040 61672205)资助
关键词 不平衡数据 过采样方法 分类 最近邻规则 imbalanced data oversampling technique classification the nearest neighbor rule
  • 相关文献

参考文献5

二级参考文献71

  • 1韩慧,王路,温明,王文渊.不均衡数据集学习中基于初分类的过抽样算法[J].计算机应用,2006,26(8):1894-1897. 被引量:10
  • 2WANG B X, JAPKOWICZ N. Boosting support vector machines for imbalanced data Sets [ J]. Knowledge and Information Systems, 2010, 25(1): 1-20.
  • 3KANG P, CHO S. EUS SVMs: ensemble of under-sampled SVMs for data imbalance problems [ C]// ICONIP 2006: International Conference on Neural Information Processing, LNCS 4232. Berlin: Springer-Verlag, 2006:837-846.
  • 4KOTSIANTIS S, KANELLOPOULOS D, PINTELAS K. Handling imbalaneed datasets: a review [ J]. GESTS International Transactions on Computer Science and Engineering, 2006, 30(1) :25 -36.
  • 5GAO J, FAN W, HAN J, et al. A general framework for mining concept-drifting data streams with skewed distributions [ C]// SDM2007: Proceedings of 2007 SIAM International Conference on Data Mining. Minneapolis: [ s. n. ], 2007:3 - 14.
  • 6GAO J, DING B, FAN W, et al. Classifying data streams with skewed class distributions and concept drifts [ J]. IEEE Internet Computing, 2008, 12(6): 37-49.
  • 7IMAM T, TING K M, KAMRUZZAMAN J. z-SVM: an SVM for improved classification of imbalanced data [ C]// AI 2006: Ad- vances in Artificial Intelligence, LNCS 4304. Berlin: Springer-Verlag, 2006:264-273.
  • 8CHAWLA N V, BOWYER K W, HALL L O, et al. SMOTE: synthetic minority over-sampling technique [ J]. Journal of Artificial Intelligence Research, 2002, 16:321-357.
  • 9Paolo S.A multi-objective optimisation approach for class im- balance learning[J].Pattem Recognition, 2011,44 ( 8 ) : 1801-1810.
  • 10Han Hui, Wang Wen-yuan, Mao Bing-huan.Borderline-SMOTE: a new over-sampling method in imbalanced data sets learn- ing[C]//Proc of International Conference on Intelligent Com- puting( ICIC' 05 ).Hefei : [s.n.], 2005 : 878-887.

共引文献63

同被引文献55

引证文献4

二级引证文献22

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部