摘要
软件缺陷预测技术是保证软件质量、提升软件测试效率的重要方法,精准发现存在潜在缺陷的软件模块,已逐渐成为软件工程领域研究的热点。针对软件动态数据流的形式,考虑软件数据流中正类样例与负类样例的严重不平衡问题,提出一种面向动态软件数据流的类不平衡缓解方法SCS算法(Class Imbalance Mitigation Algorithms)。该方法以时间序列为前提获取软件数据流,利用过采样技术与代价敏感技术相结合,提升预测模型对潜在缺陷数据的搜索范围。实验结果表明,SCS算法可有效缓解类不平衡问题。SCS算法的准确率优于传统机器学习算法10%-20%,优于动态增量学习算法5%-10%;SCS算法的误报率低于其它学习算法5%-15%左右;SCS的AUC值稳定在0.63-0.73左右。
Software defect prediction technology is an important method to ensure software quality and improve software testing efficiency.Accurately finding software modules with potential defects has gradually become a research hotspot in the field of software engineering.Aiming at the form of software dynamic data flow,considering the serious imbalance between positive and negative class examples in software data flow,a class imbalance mitigation algorithm SCS(Class Imbalance Mitigation Algorithms)for dynamic software data flow was proposed.Taking time series as the premise,software data flow was obtained.Combining oversampling technology with cost sensitive technology,the search range of prediction model for potential defect data was improved.Experimental results show that SCS algorithm can effectively alleviate the class imbalance problem,and further improve the accuracy of classification algorithm.
作者
王文彪
张春英
马英硕
WANG Wen-biao;ZHANG Chun-ying;MA Ying-shuo(College of Science,North China University of Technology,Tangshan Hebei 063210,China;Key Laboratory of Data Science and Application of Hebei,Tangshan Hebei 063210,China;Tangshan Key Laboratory of Data Science,Tangshan Hebei 063210,China)
出处
《华北理工大学学报(自然科学版)》
CAS
2023年第3期82-89,共8页
Journal of North China University of Science and Technology:Natural Science Edition
基金
河北省自然科学基金(F2018209374)。
关键词
软件缺陷预测
类不平衡问题
过采样技术
代价敏感技术
software defect prediction
quasi unbalance problem
oversampling technology
cost sensitive technology