摘要
针对软件代码的缺陷预测是常见的研究问题,但基于协议的代码缺陷预测暂时无人尝试研究。提出了改进的有监督跨域协议缺陷预测(enhanced supervised cross-domain protocol defect prediction,ESCPDP)算法,解决跨域缺陷预测中类不平衡及特征冗余等问题。首先提出Mean-ReSMOTE算法来解决数据集的类不平衡问题,其次提出Hybrid-RFE+算法对过采样后的数据进行特征选择,得到最优子集,最后使用支持向量机(support vector machine,SVM)构建有监督缺陷预测模型。在NASA数据集和自主搜集构建的Net协议缺陷数据集上,以Acc、Recall和F1值作为评测指标对提出的模型进行验证,实验结果表明改进的有监督跨域协议缺陷预测算法要优于其他经典算法,具有更好的预测效果。
Defect prediction for software code is a common research problem,but protocol-based code defect prediction is an unknown problem for the time being.In this paper,an enhanced supervised cross-domain protocol defect prediction(ESCPDP)algorithm is proposed to solve class imbalance and feature redundancy problems in the cross-domain defect prediction.Firstly,mean-RESMOTE is proposed to solve the problem of class imbalance in the dataset.Secondly,Hybrid-RFE+is proposed to solve the problem of feature selection carried out on the over-sampled data for getting the optimal subset.Finally,support vector machine(SVM)is used to build a supervised defect prediction model.Acc,Recall and F1 values are used as evaluation indexes to verify the proposed model on the NASA dataset and the Net protocol defect dataset independently collected and constructed.Experimental results show that ESCPDP algorithm is superior to other classical algorithms and has better prediction effect.
作者
周超
王震
秦富童
刘义
ZHOU Chao;WANG Zhen;QIN Futong;LIU Yi(Unit 63891 of PLA,China)
出处
《计算机工程与应用》
CSCD
北大核心
2023年第16期256-261,共6页
Computer Engineering and Applications
关键词
缺陷预测
类不平衡
过采样
特征选择
有监督学习
defect prediction
class imbalance
over-sampled
feature selection
supervised learning