摘要
针对软件缺陷预测中由多种准确度相关度量并存而造成的预测算法不易对比、度量数据不一致等数据质量问题,提出基于混淆矩阵的二值分类质量检查(BCQualityCheck)方法.该方法从正确性、一致性等方面对预测准确度相关性能数据的数据质量进行评估,也可用于不同准确度相关度量之间的转换.利用BCQualityCheck方法,针对分布在35篇论文中的1633条有效软件缺陷预测实验数据进行了数据质量评估,实验结果表明:分布在16篇论文中的共计16%的预测性能数据存在不同程度的质量问题;在验证了BCQualityCheck方法有效性的同时,也表明对预测性能数据进行质量评估非常必要.
Aiming at data quality issues such as inconvenient algorithm comparison, inconsistent among measurement caused by the coexistence of multiple prediction accuracy related metrics in software defect prediction,a binary classification quality check(BCQualityCheck) method was proposed based on confusion matrix to ensure the correctness and consistency of defect prediction performance data,which could be also used for data transformation between different performance measurements.A data quality evaluation experiment with BCQualityCheck was conducted based on the selected 35 research papers including 1 633 defect prediction experimental data,and experiment results show that 16% prediction experimental results contained in 16 papers have performance data quality problems, indicating that BCQualityCheck is effective and it is essential to check the data quality of prediction performance result.
作者
李宁
郭育晨
王小玲
张利军
LI Ning;GUO Yuchen;WANG Xiaoling;ZHANG Lijun(School of Computer,Northwestern Polytechnical University,Xi’an 710129,China;School of Computer Science and Technology,Xi’an Jiaotong University,Xi’an 710049,China)
出处
《华中科技大学学报(自然科学版)》
EI
CAS
CSCD
北大核心
2020年第11期24-29,共6页
Journal of Huazhong University of Science and Technology(Natural Science Edition)
基金
国家自然科学基金资助项目(61972317)
国家自然科学基金青年基金资助项目(61402370)。
关键词
软件缺陷预测
预测性能
混淆矩阵
数据正确性
数据一致性
software defect prediction
predict performance
confusion matrix
data correctness
data consistency