期刊文献+

面向软件缺陷预测的聚类欠采样集成方法 被引量:3

Clustering-based under-sampling ensemble method for software defect prediction
下载PDF
导出
摘要 为缓解类不平衡问题对预测模型性能的影响,提出一种基于聚类的欠采样集成方法 CBUE(cluster-based undersampling ensemble method)。对多数类进行聚类分析,根据聚类的结果分布(即每个簇的大小比例)有放回地选择N个多数类的子集,N个子集分别和所有的少数类实例组成N个新的训练集;根据N个训练集训练出N个分类器,按照少数服从多数的原则生成一个新的集成分类器对新的数据进行预测。CBUE以NASA数据集作为评测对象,以balance、G-mean和AUC为评测指标,实验结果表明,该方法在大部分情况下要优于5种经典的基准方法 (ROS、RUS、SMOTE、RF和NB)。 To alleviate the impact of class imbalanced problem on the performance of prediction model,a cluster-based under-sampling ensemble method (CBUE)was proposed.The majority was clustered.N subsets of the majority were selected accor-ding to the distribution of clustering result which reflected the ratio of every cluster.N subsets and all minority instances were united to compose new N training sets respectively.N classifiers were trained according to N training sets and a new ensemble classifier was constructed which predicted new data based on majority rule.NASA datasets were used as evaluation datasets,and the balance,G-mean and AUC were taken as evaluation indicators.Experimental results show that the method is superior to five classical methods (ROS,RUS,SMOTE,RF and NB)in most cases.
出处 《计算机工程与设计》 北大核心 2016年第7期1805-1810,1891,共7页 Computer Engineering and Design
基金 国家自然科学基金项目(61202006 61272424) 计算机软件新技术国家重点实验室开放课题基金项目(KFKT2012B29) 江苏省自然科学基金项目(BK2010277) 江苏省科技创新基金项目(BC2013167) 江苏省高校自然科学基金项目(12KJB520014)
关键词 类不平衡学习 软件缺陷预测 集成学习方法 欠采样 聚类 class imbalance learning software defect prediction ensemble learning method under-sampling clustering
  • 相关文献

参考文献18

  • 1王青,伍书剑,李明树.软件缺陷预测技术[J].软件学报,2008,19(7):1565-1580. 被引量:149
  • 2Hall T,Beecham S,Bowes D,et al.A systematic review of fault prediction performance in software engineering[J].IEEE Transactions on Software Engineering,2012,38(6):1276-1304.
  • 3Arisholm E,Briand LC,Johannessen EB.A systematic and comprehensive investigation of methods to build and evaluate fault prediction models[J].Journal Systems and Software,2010,83(1):2-17.
  • 4He Haibo,Garcia EA.Learning from imbalanced data[J].IEEE Transactions on Knowledge and Data Engineering,2009,21(9):1263-1284.
  • 5Chen J,Liu S,Liu W,et al.A two-stage data preprocessing approach for software fault prediction[C]//8th International Conference on Software Security and Reliability,2014:20-29.
  • 6Liu S,Chen X,Liu W,et al.FECAR:A feature selection framework for software defect prediction[C]//Computer Software and Applications Conference,IEEE 38th Annual,2014:426-435.
  • 7García V,Sánchez JS,Mollineda RA.On the effectiveness of preprocessing methods when dealing with different levels of class imbalance[J].Knowledge-Based Systems,2012,25(1):13-21.
  • 8Menzies T,Greenwald J,Frank A.Data mining static code attributes to learn defect predictors[J].IEEE Transactions on Software Engineering,2007,33(1):2-13.
  • 9Jing XY,Ying S,Zhang ZW,et al.Dictionary learning based software defect prediction[C]//Proceedings of the International Conference on Software Engineering,2014:414-423.
  • 10Wang Shuo,Yao Xin.Using class imbalance learning for software defect prediction[J].IEEE Transactions on Reliability,2013,62(2):434-443.

二级参考文献35

共引文献158

同被引文献19

引证文献3

二级引证文献15

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部