期刊文献+

基于DBSCAN聚类分解和过采样的随机森林不平衡数据分类算法

Random forest imbalanced data classification algorithm based on DBSCAN clustering decomposition and oversampling
下载PDF
导出
摘要 针对传统方法在不平衡数据分类时易导致生成假样本数量多或数据丢失等问题,提出了一种基于DBSCAN聚类分解和过采样的随机森林不平衡数据分类算法.首先,将基于密度的DBSCAN聚类分解算法应用于不平衡数据集的多数类,在没有数据丢失的情况下降低了多数类样本的优势;其次,通过Borderline-SMOTE算法对少数类进行过采样,增加了少数类样本的数量,从而得到更加平衡的数据集,有效地解决了过采样时生成过多假样本而导致过拟合的问题,同时避免了欠采样方法造成数据丢失的问题;最后,在聚类分解和过采样算法的前提下,验证了随机森林比SVM、Adaboost、Bagging、XGBoost有更好的效果.在KEEL公用数据集上与其他流行算法进行实验比较,结果显示该算法有效地提高了不平衡数据的分类性能. To address the problem that traditional methods are prone to generate a large number of false samples or data loss when classifying imbalanced data,a random forest imbalanced data classification algorithm based on DBSCAN clustering decomposition and oversampling is proposed.First,the density-based DBSCAN clustering decomposition algorithm was applied to the majority class of the imbalanced dataset,which reduces its advantage without data loss.Secondly,the minority class was oversampled by the Borderline-SMOTE algorithm.The number of minority samples was increased to obtain a more balanced dataset,which effectively solved the problem of over-fitting caused by generating too many false samples during over-sampling,and at the same time avoided the problem of data loss caused by under-sampling.Finally,under the premise of the clustering decomposition and oversampling algorithm,random forest achieved better results than SVM,Adaboost,Bagging,and XGBoost.Experimental comparison with other popular algorithms on the KEEL public dataset shows that the proposed algorithm can effectively improve the classification performance of imbalanced data.
作者 赵小强 姚青磊 ZHAO Xiao-qiang;YAO Qing-lei(College of Electrical Engineering and Information Engineering,Lanzhou Univ.of Tech.,Lanzhou 730050,China;Key Laboratory of Advanced Control of Industrial Processes of Gansu Province,Lanzhou Univ.of Tech.,Lanzhou 730050,China;National Electrical and Control Engineering Experimental Teaching Center,Lanzhou Univ.of Tech.,Lanzhou 730050,China)
出处 《兰州理工大学学报》 CAS 北大核心 2023年第6期80-89,共10页 Journal of Lanzhou University of Technology
基金 国家自然科学基金(62263021) 甘肃省高校产业支撑计划项目(2023CYZC-24) 甘肃省科技计划资助项目(21YF5GA072)。
关键词 不平衡数据 分类算法 DBSCAN 随机森林 imbalanced data classification algorithm DBSCAN random forest
  • 相关文献

参考文献7

二级参考文献71

共引文献285

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部