期刊文献+

类不平衡稀疏重构度量学习软件缺陷预测 被引量:3

Prediction of Defect of Class-imbalance Sparse Reconstruction Metric Learning Software
下载PDF
导出
摘要 软件缺陷预测是提升软件质量的重要手段。为了改善缺陷预测性能,目前许多机器学习领域的最新成果已经引入到软件缺陷预测中。但是,软件缺陷预测数据通常存在类别分布不平衡的问题,这会影响预测效果。针对这个问题,提出了类不平衡稀疏重构距离度量学习软件缺陷预测方法。该方法首先在度量学习中加入代价敏感因素,学习距离度量特征矩阵并解决软件缺陷预测中分类错误代价不同的问题。其次,通过在目标函数中加入权重来进一步提高小类样本距离度量学习的准确性。最后,为了解决预测阶段数据集的类别不平衡问题,采用了改进加权KNN算法预测测试样本标签。在NASA软件缺陷预测标准数据集上的实验结果证明了该方法能提高召回率与F-measure值,改善分类性能。 Software defect prediction( SDP) is an important method to improve the quality of software.Currently many latest results from machine learning has been applied to improve the performance of defect prediction.However,imbalance of class distribution usually exists in SDP dataset,which might affect the prediction performance.For this,we propose a novel software defect prediction method termed class-imbalance sparse reconstruction metric learning( CSRML).In CSRML,by introducing cost-sensitive factor into metric learning,a feature matrix of distance metric can be learned and the problem of different cost of misclassification can also be solved.And weight parameter is added in objective function to further improve the accuracy of the small class samples distance metric learning. Finally,improved weighted KNN( IWKNN) method is employed to predict the label of test sample for tackling class imbalance in prediction phase.Experiment on the NASA SDP dataset demonstrates that the proposed method can improve the recall rate,F-measure value and classification performance.
作者 史作婷 吴迪 荆晓远 吴飞 SHI Zuo-ting;WU Di;JING Xiao-yuan;WU Fei(School of Computer Science,Nanjing University of Posts and Telecommunications,Nanjing 210003,China;State Key Laboratory of Software Engineering,School of Computer,Wuhan University,Wuhan 430072,China;School of Automation,Nanjing University of Posts and Telecommunications,Nanjing 210003,China)
出处 《计算机技术与发展》 2018年第6期125-128,136,共5页 Computer Technology and Development
基金 国家自然科学基金(61272273)
关键词 软件缺陷预测 类不平衡 改进加权KNN 度量学习 software defect prediction class-imbalance IWKNN metric learning
  • 相关文献

参考文献4

二级参考文献51

  • 1MENZIES T, GREENWALD J, FRANK A. Data mining static code attributes to learn defect predictors [ J ]. IEEE Transac- tion on Software Engineering,2007,32 ( 11 ) : 2 - 13.
  • 2LESSMANN S, BAESENS B, MUES C, et al. Benchmarking classification models for software defect prediction: a pro- posed framework and novel findings [ J ]. IEEE Transactions on Software Engineering, 2008,4 ( 34 ) :485 - 496.
  • 3KHOSHGOFFAAR T M, PANDYA A S, LANNING D L. Ap- plication of neural networks for predicting defects [ J]. An- nals of Software Engineering, 1995,1 ( 1 ) : 141 - 154.
  • 4MENZIES T, DISTEFANO J, ORREGO A, et al. Assessing predictors of software defects [ C ]. In Proceedings of Work- shop on Predictive Software Models ,2004.
  • 5PORTER A, SELBY R W. Evaluating techniques for genera- ting metric - based classification trees [ J ]. Journal of Sys- tems and Software, 1997,12 (2) : 166 - 173.
  • 6BOEHM B W, PAPACCIO P N. Understanding and control- ling software costs [ J ]. IEEE Transactions on Software Engi- neering, 1988,14(10) : 1462 - 1477.
  • 7BOEHM B W. Industrial software metrics top 10 list [J]. IEEE Software, 1987,4 (5) : 84 - 85.
  • 8MALOOF M A. Learning when data sets are imbalanced and when costs are unequal and unknown [ C ]. Washington, DC : In Working Notes of the ICML'03 Workshop on Learning from Imbatanced Data Sets ,2003,8:328 -334.
  • 9ZHOU Z H, LIU X Y. Training cost - sensitive neural net- works with methods addressing the class imbalance problem [ J]. IEEE Transactions on Knowledge and Data Engineer- ing,2006,18( 1 ) :63 -77.
  • 10BREIMAN L,FRIEDMAN J H, OKSHEN R A, et al. Classifica- tion and regression trees [ M ]. Belmont, CA :Wadsworth, 1984.

共引文献41

同被引文献29

引证文献3

二级引证文献171

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部