即时软件缺陷预测针对项目开发与维护过程中的代码提交来预测是否会引入缺陷。在即时软件缺陷预测研究领域,模型训练依赖于高质量的数据集,然而已有的即时软件缺陷预测方法尚未研究数据集扩充方法对即时软件缺陷预测的影响。为提高即时...即时软件缺陷预测针对项目开发与维护过程中的代码提交来预测是否会引入缺陷。在即时软件缺陷预测研究领域,模型训练依赖于高质量的数据集,然而已有的即时软件缺陷预测方法尚未研究数据集扩充方法对即时软件缺陷预测的影响。为提高即时软件缺陷预测的性能,提出一种基于数据集扩充的即时软件缺陷预测(prediction based on data augmentation,PDA)方法。PDA方法包括特征拼接、样本生成、样本过滤和采样处理4个部分。增强后的数据集样本数量充足、样本质量高且消除了类不平衡问题。将提出的PDA方法与最新的即时软件缺陷预测方法(JIT-Fine)作对比,结果表明:在JIT-Defects4J数据集上,F_(1)指标提升了18.33%;在LLTC4J数据集上,F_(1)指标仍有3.67%的提升,验证了PDA的泛化能力。消融实验证明了所提方法的性能提升主要来源于数据集扩充和筛选机制。展开更多
This paper presents a new correction method, "instant correction method(ICM)", to improve the accuracy of numerical prediction products(NPP) and provide weather variables at grid cells. The ICM makes use of ...This paper presents a new correction method, "instant correction method(ICM)", to improve the accuracy of numerical prediction products(NPP) and provide weather variables at grid cells. The ICM makes use of the continuity in time of the forecast errors at different forecast times to improve the accuracy of large scale NPP. To apply the ICM in China, an ensemble correction scheme is designed to correct the T213 NPP(the most popular NPP in China) through different statistical methods. The corrected T213 NPP(ICM T213 NPP) are evaluated by four popular indices: Correlation coefficient, climate anomalies correlation coefficient, root-mean-square-errors(RMSE), and confidence intervals(CI). The results show that the ICM T213 NPP are more accurate than the original T213 NPP in both the training period(2003–2008) and the validation period(2009–2010). Applications in China over the past three years indicate that the ICM is simple, fast, and reliable. Because of its low computing cost, end users in need of more accurate short-range weather forecasts around China can benefit greatly from the method.展开更多
文摘即时软件缺陷预测针对项目开发与维护过程中的代码提交来预测是否会引入缺陷。在即时软件缺陷预测研究领域,模型训练依赖于高质量的数据集,然而已有的即时软件缺陷预测方法尚未研究数据集扩充方法对即时软件缺陷预测的影响。为提高即时软件缺陷预测的性能,提出一种基于数据集扩充的即时软件缺陷预测(prediction based on data augmentation,PDA)方法。PDA方法包括特征拼接、样本生成、样本过滤和采样处理4个部分。增强后的数据集样本数量充足、样本质量高且消除了类不平衡问题。将提出的PDA方法与最新的即时软件缺陷预测方法(JIT-Fine)作对比,结果表明:在JIT-Defects4J数据集上,F_(1)指标提升了18.33%;在LLTC4J数据集上,F_(1)指标仍有3.67%的提升,验证了PDA的泛化能力。消融实验证明了所提方法的性能提升主要来源于数据集扩充和筛选机制。
基金partially supported by the National Natural Science Foundation of China(Grant No.91125010)
文摘This paper presents a new correction method, "instant correction method(ICM)", to improve the accuracy of numerical prediction products(NPP) and provide weather variables at grid cells. The ICM makes use of the continuity in time of the forecast errors at different forecast times to improve the accuracy of large scale NPP. To apply the ICM in China, an ensemble correction scheme is designed to correct the T213 NPP(the most popular NPP in China) through different statistical methods. The corrected T213 NPP(ICM T213 NPP) are evaluated by four popular indices: Correlation coefficient, climate anomalies correlation coefficient, root-mean-square-errors(RMSE), and confidence intervals(CI). The results show that the ICM T213 NPP are more accurate than the original T213 NPP in both the training period(2003–2008) and the validation period(2009–2010). Applications in China over the past three years indicate that the ICM is simple, fast, and reliable. Because of its low computing cost, end users in need of more accurate short-range weather forecasts around China can benefit greatly from the method.