期刊文献+

一种面向不平衡数据集的过采样算法

An oversampling algorithm for imbalanced datasets
下载PDF
导出
摘要 传统过采样算法通过合成少数类样本来改善不平衡问题,但未考虑产生噪点与样本分布不均匀等问题,针对该类问题,提出了一种基于聚类与对改进SMOTE的过采样算法SK-SMOTE。该算法在聚类前,先合成一部分少数样本,以此提高少数类样本数量,同时根据合成的少数类样本的邻居样本的类别和距离赋予权重,通过权重总和是否大于设定的值来决定该样本是否可以被保留。在提高少数类样本数量后,再使用KMeans算法进行聚类,然后保留少数样本较多的簇。在簇内进行过采样,相对稀疏的簇将合成更多的少数类样本。选取UCI和KEEL数据库中的不平衡数据集,将SVM、RF、KNN作为分类算法,并选用几种经典的SMOTE算法与SK-SMOTE进行多组对比实验。实验结果表明,SK-SMOTE算法可有效平衡不平衡数据集,且在不平衡比例较高的数据集上取得了比传统过采样算法更好的结果。 The traditional oversampling algorithms improve the imbalance problem by synthesizing minority class samples,but they do not consider issues such as generating noise and uneven sample distribution.In response to this kind of problem,a clustering-based oversampling algorithm called SK-SMOTE,which improves SMOTE,was proposed.Before clustering,this algorithm synthe-sized a portion of minority samples to increase their number.Then,according to the categories and distances of the neighboring sam-ples of the synthesized minority class samples,weights were assigned.By determining whether the weight sum was greater than the set value,it decided whether the sample can be retained.After increasing the number of minority samples,the KMeans algorithm was used for clustering,and clusters with more minority samples were retained.Oversampling was performed within the clusters,with sparser clusters synthesizing more minority class samples.Several classic SMOTE algorithms and SK-SMOTE were compared using SVM,RF,and KNN as classification algorithms and imbalanced datasets from UCI and KEEL databases.The experimental re-sults show that SK-SMOTE algorithm can effectively balance imbalanced datasets and achieve better results than traditional over-sampling algorithms,especially on datasets with higher imbalance ratios.
作者 张文辉 罗鸿豪 ZHANG Wenhui;LUO Honghao(School of Computer and Information Security,Guilin University of Electronic Technology,Guilin 541004,China)
出处 《桂林电子科技大学学报》 2023年第5期363-370,共8页 Journal of Guilin University of Electronic Technology
基金 国家自然科学基金(61966007) 广西自然科学基金(2022GXNSFAA035629)。
关键词 SMOTE算法 不平衡数据 评分机制 K-MEANS算法 过采样 SMOTE algorithm imbalanced data scoring mechanism K-means algorithm over-samplin
  • 相关文献

参考文献2

二级参考文献12

共引文献31

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部