摘要
软件缺陷预测在提高软件质量、控制和平衡软件成本方面起着举足轻重的作用,是软件工程的活跃领域.研究者提出了许多预测技术,从不同层面解决了不同的问题,但目前仍有些问题尚待研究:软件缺陷数据分布不均衡、误分代价差异、跨项目软件缺陷经验知识共享困难等.为了解决上述问题,提出一种基于迁移学习的软件缺陷预测经验共享方法,该方法在著名的迁移学习算法Tr Ada Boost基础上增加误分代价来提高有错误倾向模块的识别率,对目标软件项目数据和辅助软件项目数据采用不同的权重更新策略以区分它们对于目标软件缺陷预测的不同影响.通过对美国国家航天局软件工程项目NASA的JM1和KC2数据进行仿真实验,证明该方法在预测性能方面优于同类方法,具有预测效果良好和稳定性强的特点.实验结果表明在相近的软件开发环境下,软件开发团队之间可以有效地分享和继承丰富的软件缺陷经验,有效地提高软件产品的质量.
Software fault-prone prediction, an active research topic in software engineering domain,plays an important role in improving software quality, controlling and balancing software cost. A large amount of different fault prediction studies have been developed to solve different problems from different aspects. However, there are still some problems needed to be researched,such as imbaianced software fault data, different misclassification cost, and difficulty in sharing fault-prone prediction empirical experiences. To solve the above problems,this paper presents a method to share fault-prone prediction empirical experiences across projects based on transfer learning. The proposed method extends the famous transfer learning framework TrAdaBoost by adding misclassification cost to im- prove the probability of detecting fault-prone modules. It adapts different update weight strategies to emphasize different roles of differ- ent project data. The proposed method is compared with multiple existing data mining and machine learning approaches on JM1 and KC2 data sets of the NASA metrics data program repository. Comparative experimental results show that the proposed method was better than the others in effectiveness and stability. Simulation results indicate that software development teams can share fault-prone prediction experiences under the similar homogeneous domain and process, thus improve software quality effectively.
出处
《小型微型计算机系统》
CSCD
北大核心
2014年第11期2416-2421,共6页
Journal of Chinese Computer Systems
基金
国家自然科学基金项目(61163007
61262010)资助
江西省自然科学基金项目(20114BAB211019
20132BAB201036)资助
江西省教育厅科技项目(GJJ12731
GJJ13305
GJJ12743)资助
江西省电子商务高水平工程研究中心项目(基于不均衡大数据的商务智能研究)资助
关键词
软件缺陷预测
软件度量
迁移学习
不均衡数据
代价敏感
software fault-prone prediction
software metrics
transfer learning
imbalanced data
cost-sensitive