摘要
软件缺陷预测是提升软件质量的重要手段。为了改善缺陷预测性能,目前许多机器学习领域的最新成果已经引入到软件缺陷预测中。但是,软件缺陷预测数据通常存在类别分布不平衡的问题,这会影响预测效果。针对这个问题,提出了类不平衡稀疏重构距离度量学习软件缺陷预测方法。该方法首先在度量学习中加入代价敏感因素,学习距离度量特征矩阵并解决软件缺陷预测中分类错误代价不同的问题。其次,通过在目标函数中加入权重来进一步提高小类样本距离度量学习的准确性。最后,为了解决预测阶段数据集的类别不平衡问题,采用了改进加权KNN算法预测测试样本标签。在NASA软件缺陷预测标准数据集上的实验结果证明了该方法能提高召回率与F-measure值,改善分类性能。
Software defect prediction( SDP) is an important method to improve the quality of software.Currently many latest results from machine learning has been applied to improve the performance of defect prediction.However,imbalance of class distribution usually exists in SDP dataset,which might affect the prediction performance.For this,we propose a novel software defect prediction method termed class-imbalance sparse reconstruction metric learning( CSRML).In CSRML,by introducing cost-sensitive factor into metric learning,a feature matrix of distance metric can be learned and the problem of different cost of misclassification can also be solved.And weight parameter is added in objective function to further improve the accuracy of the small class samples distance metric learning. Finally,improved weighted KNN( IWKNN) method is employed to predict the label of test sample for tackling class imbalance in prediction phase.Experiment on the NASA SDP dataset demonstrates that the proposed method can improve the recall rate,F-measure value and classification performance.
作者
史作婷
吴迪
荆晓远
吴飞
SHI Zuo-ting;WU Di;JING Xiao-yuan;WU Fei(School of Computer Science,Nanjing University of Posts and Telecommunications,Nanjing 210003,China;State Key Laboratory of Software Engineering,School of Computer,Wuhan University,Wuhan 430072,China;School of Automation,Nanjing University of Posts and Telecommunications,Nanjing 210003,China)
出处
《计算机技术与发展》
2018年第6期125-128,136,共5页
Computer Technology and Development
基金
国家自然科学基金(61272273)