期刊文献+

面向非均衡数据的二进制排队搜索特征选择机制

Feature selection mechanism based on binary queue search for unbalanced data
下载PDF
导出
摘要 非均衡数据(分类不均匀分布)和冗余特征的出现极大增加了数据准确分类的难度.以最优化学习算法的预测准确率为目标,结合合成少数过采样技术SMOTE,设计了一种针对非均衡数据的二进制排队搜索方法的包装式特征选择算法BQSA,利用PROMISE知识库中十四种软件故障预测数据集进行实验分析.测试了数据集过采样比例的影响,证实合成少数过采样对高度非均衡数据的分类预测具有正面影响,并得到了最佳过采样率;比较了BQSA与同类算法的性能,证实结合合成少数过采样技术的BQSA算法拥有更优的预测准确性,在分类敏感度、专一性以及曲线下面积AUC等指标上表现更佳. The unbalanced data(non-uniform distribution of classes)and the redundant features dramatically increased the difficulty of data accurate classification.Taking the prediction accuracy of the optimal learning algorithm as the goal,combined with the synthetic minority oversampling technology SMOTE,a wrapper feature selection algorithm BQSA was desigend for binary queue search method of unbalanced data.Using 14 kinds of software fault prediction in PROMISE knowledge base to conduct experimental analysis of datasets.The influence of the over-sampling ratio of the dataset is tested,and it is proved that the synthesis of a few over-sampling has a positive effect on the classification prediction of highly unbalanced data,and the optimal over-sampling rate is obtained.The performance of BQSA is compared with similar algorithms,and it is proved that the BQSA algorithm combined with synthetic minority oversampling has better prediction accuracy and better performance in classification sensitivity,specificity and AUC of area under the curve.
作者 郭嘉 GUO Jia(School of Electronic Information Engineering,Zhengzhou Sisa University,Zhengzhou 451150,Henan,China)
出处 《微电子学与计算机》 2021年第8期45-52,共8页 Microelectronics & Computer
基金 国家自然科学基金项目(62110817) 河南省高等学校重点科研项目(19B520028,19B520029)。
关键词 特征选择 非均衡数据 排队搜索算法 合成少数过采样 学习算法 feature selection imbalanced data queuing search algorithm synthetic minority oversampling learning algorithm
  • 相关文献

参考文献2

二级参考文献21

共引文献11

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部