期刊文献+

结合欠抽样与集成的软件缺陷预测 被引量:7

Software defects prediction based on under-sampling and ensemble algorithm
下载PDF
导出
摘要 软件缺陷预测是提高测试效率、保证软件可靠性的重要途径。为了提高软件缺陷预测的准确率,提出一种结合欠抽样与决策树分类器集成的软件缺陷预测模型。考虑到软件缺陷数据的类不平衡特性,首先,通过数据的不平衡率确定抽样度,执行欠抽样实现数据的重新平衡;然后,采用Bagging随机抽样原理训练若干个决策树子分类器;最后,按照少数服从多数的原则生成预测模型。使用公开的NASA软件缺陷预测数据集进行了仿真实验。实验结果表明,与3种基准方法对比,所提模型在保证预报率的前提下,误报率(PF)降低了10%以上,综合评价指标均有显著提升。该模型的缺陷预测误报率较低,而且具有较高的预测准确率与稳定性。 Software defects prediction is considered as a means for the improvement of test efficiency and assurance of software reliability. To improve the accuracy of software defect prediction, a model based on under-sampling and decision tree ensemble algorithm was proposed. Firstly, taking into account class imbalance of software defect data, the random under-sampling technique was used to rebalance the data according to the imbalance rate. Then, several decision tree sub-classifiers were trained by using Bagging's random sampling. Finally, the defect prediction model was constructed based on majority rule. The experiments were carried out on the NASA MDP datasets. The experimental results show that, compared with three standard methods, the Probability of False alarm (PF) of the proposed model is reduced by 10% while ensuring probability of detection and the comprehensive evaluation index is improved significantly. It has low PF of defect prediction, and it is more effective and stable in software defects prediction practices.
作者 李勇
出处 《计算机应用》 CSCD 北大核心 2014年第8期2291-2294,2310,共5页 journal of Computer Applications
基金 新疆维吾尔自治区高校科研计划项目(XJEDU2012S28) 教育部人文社会科学研究青年基金资助项目(11YJC870014) 国家自然科学基金资助项目(61262065) 新疆师范大学重点实验室基金资助项目(WLYQ2012108)
关键词 软件缺陷预测 类不平衡数据 欠抽样 决策树 集成算法 software defect prediction class imbalanced data under-sampling decision tree ensemble algorithm
  • 相关文献

参考文献27

  • 1CATAL C,DIRI B.A systematic review of software fault prediction studies [J].Expert Systems with Applications,2009,36(4):7346-7354.
  • 2MENZIES T,GREENWALD J,FRANK A.Data mining static code attributes to learn defect predictors [J].IEEE Transactions on Software Engineering,2007,33(1):2-13.
  • 3CATAL C,DIRI B.Investigating the effect of dataset size,metrics sets,and feature selection techniques on software fault prediction problem [J].Information Sciences,2009,179(8):1040-1058.
  • 4ARISHOLM E,BRIAND L C,JOHANNESSEN E B.A systematic and comprehensive investigation of methods to build and evaluate fault prediction models [J].Journal of Systems and Software,2010,83(1):2-17.
  • 5SHUO W,XIN Y.Relationships between diversity of classification ensembles and single-class performance measures [J].IEEE Transactions on Knowledge and Data Engineering,2013,25(1):206-219.
  • 6MENZIES T,MILTON Z,TURHAN B,et al.Defect prediction from static code features:current results,limitations,new approaches [J].Automated Software Engineering,2010,17(4):375-407.
  • 7MENZIES T,CAGLAYAN B,KOCAGUNELI E,et al.The promise repository of empirical software engineering data [EB/OL].[2014-01-05].http://promisedata.googlecode.com.
  • 8HALSTEAD M H.Elements of software science(operating and programming systems series) [M].New York:Elsevier Science,1977:128.
  • 9McCABE T J.A complexity measure [J].IEEE Transactions on Software Engineering,1976(4):308-320.
  • 10KHOSHGOFTAAR T M,SELIYA N.Software quality classification modeling using the SPRINT decision tree algorithm [J].International Journal on Artificial Intelligence Tools,2003,12(3):207-225.

二级参考文献79

  • 1郑恩辉,李平,宋执环.代价敏感支持向量机[J].控制与决策,2006,21(4):473-476. 被引量:33
  • 2刘胥影,吴建鑫,周志华.一种基于级联模型的类别不平衡数据分类方法[J].南京大学学报(自然科学版),2006,42(2):148-155. 被引量:23
  • 3张靖,葛玮,郝克刚.软件度量中主成分分析方法的研究[J].计算机技术与发展,2006,16(12):144-147. 被引量:3
  • 4WU Xin-dong,KUMAR V,QUINLAN J R,et al.Top 10 algorithms in data mining[J].Knowledge and Information Systems,2008,14(1):1-37.
  • 5CHAWLA N V,JAPKOWICZ N,KOTCZ A.Editorial:special issue on learning from imbalanced data sets[J].ACM SIGKDD Explorations Newsletter,2004,6(1):1-6.
  • 6HE Hai-bo,GARCIA E A.Learning from imbalanced data[J].IEEE Trans on Knowledge and Data Engineering,2009,21(9):1263-1284.
  • 7TING K M.A comparative study of cost-sensitive boosting algorithms[C]//Proc of the 17th International Conference on Machine Learning.2000:983-990.
  • 8FAN Wei,STOLFO S J,ZHANG Jun-xin,et al.AdaCost:misclassification cost-sensitive boosting[C]//Proc of the 16th International Conference on Machine Learning.1999:97-105.
  • 9SUN Yan-min,KAMEL M S,WONG A K C,et al.Cost-sensitive boosting for classification of imbalanced data[J].Pattern Recognition,2007,40(12):3358-3378.
  • 10GALAR M,FERNNDEZ A,BARRENCHEA E,et al.EUSBoost:enhancing ensembles for highly imbalanced data-sets by evolutionary undersampling[J].Pattern Recognition,2013,46(12):3460-3471.

共引文献83

同被引文献29

引证文献7

二级引证文献21

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部