期刊文献+

整合DBSCAN和改进SMOTE的过采样算法 被引量:15

Hybrid Algorithm of DBSCAN and Improved SMOTE for Oversampling
下载PDF
导出
摘要 针对SMOTE(Synthetic Minority Over-sampling Technique)等传统过采样算法存在的忽略类内不平衡、扩展少数类的分类区域以及合成的新样本高度相似等问题,基于综合考虑类内不平衡和合成样本多样性的思想,提出了一种整合DBSCAN和改进SMOTE的过采样算法DB-MCSMOTE(DBSCAN and Midpoint Centroid Synthetic Minority Over-sampling Technique)。该算法对少数类样本进行DBSCAN聚类,根据提出的簇密度分布函数,计算各个簇的簇密度和采样权重,在各个簇中利用改进的SMOTE算法(MCSMOTE)在相距较远的少数类样本点之间的连线上进行过采样,提高合成样本的多样性,得到新的类间和类内综合平衡数据集。通过对一个二维合成数据集和九个UCI数据集的实验表明,DB-MCSMOTE可以有效提高分类器对少数类样本和整体数据集的分类性能。 For conventional oversampling algorithms,for example,SMOTE(Synthetic Minority Over-sampling Technique),there are several problems such as ignoring within-class imbalances,extending the classification regions of minority class and synthesizing highly similar samples.Based on the comprehensive consideration of within-class imbalance and synthetic samples in diversity,an oversampling algorithm,which is a hybrid of DBSCAN and improved SMOTE(DB-MCSMOTE),is proposed.It utilizes the DBSCAN algorithm to cluster the minority class samples.According to the proposed cluster density distribution function,the cluster density and sampling weight of each cluster are calculated.The MCSMOTE algorithm is adopted to oversample on the lines of the location-distant minority class samples in each cluster,the diversity of synthetic samples is improved and a new balanced dataset between and within classes is obtained.Experiments on a two-dimensional synthesis data set and nine UCI data sets show that DB-MCSMOTE can effectively improve the classification performance of the classifier for the minority class samples and the overall data set.
作者 王亮 冶继民 WANG Liang;YE Jimin(School of Mathematics and Statistics,Xidian University,Xi’an 710126,China)
出处 《计算机工程与应用》 CSCD 北大核心 2020年第18期111-118,共8页 Computer Engineering and Applications
基金 国家自然科学基金(No.61573014) 中央高校基本科研基金(No.JB180702)。
关键词 过采样 类内不平衡 少数类 多样性 SMOTE算法 DBSCAN算法 oversampling within-class imbalance minority class diversity Synthetic Minority Over-sampling Technique(SMOTE)algorithm Density-Based Spatial Clustering of Applications with Noise(DBSCAN)algorithm
  • 相关文献

参考文献4

二级参考文献53

  • 1蒋盛益,谢照青,余雯.基于代价敏感的朴素贝叶斯不平衡数据分类研究[J].计算机研究与发展,2011,48(S1):387-390. 被引量:21
  • 2倪巍伟,孙志挥,陆介平.k-LDCHD——高维空间k邻域局部密度聚类算法[J].计算机研究与发展,2005,42(5):784-791. 被引量:18
  • 3刘高军,朱嬿.基于数据挖掘技术的建筑企业信用评价[J].中国矿业大学学报,2005,34(4):494-499. 被引量:21
  • 4Han Hui,Wang Wenyuan,Mao Bing-huan.BorderlineSMOTE:a new over-sampling method in imbalanced data sets learning[C]//Proc of International Conference on Intelligent Computing(ICIC'05),2005:878-887.
  • 5Jason V H,Taghi K.Knowledge discovery from imbalanced and noisy data[J].Data&Knowledge Engineering,2009,68:1513-1542.
  • 6Hart R E.The condensed nearest neighbor rule[J].IEEE Transactions on Information Theory,1968,14(3):515-516.
  • 7Wilson D R,Martinez T R.Reduction techniques for instance-based learning algorithms[J].Machine Learning,2000,38:257-286.
  • 8Chawla N V,Bowyer K W,Hall L O,et al.SMOTE:synthetic minority over-sampling technique[J].Journal of Artificial Intelligence Research,2002,16:321-357.
  • 9Wang Chin Heng,Lee Lam Hong,Rajkumar R,et al.A hybrid text classification approach with low dependency on parameter by integrating K-nearest neighbor and support vector machine[J].Expert Systems with Applications,2012,39:11880-11888.
  • 10Blanzieri E,Melgani F.Nearest neighbor classification of remote sensing images with the maximal margin principle[J].IEEE Transactions on Geoscience and Remote Sensing,2008,46(6):1804-1811.

共引文献111

同被引文献111

引证文献15

二级引证文献21

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部