期刊文献+

基于特征选择和TrAdaBoost的跨项目缺陷预测方法 被引量:4

Cross-project defect prediction method based on feature selection and TrAdaBoost
下载PDF
导出
摘要 跨项目软件缺陷预测可以解决预测项目中训练数据较少的问题,然而源项目和目标项目通常会有较大的数据分布差异,这降低了预测性能。针对该问题,提出了一种基于特征选择和TrAdaBoost的跨项目缺陷预测方法(CPDP-FSTr)。首先,在特征选择阶段,采用核主成分分析法(KPCA)删除源项目中的冗余数据;然后,根据源项目和目标项目的属性特征分布,按距离选出与目标项目分布最接近的候选源项目数据;最后,在实例迁移阶段,通过采用评估因子改进的TrAdaBoost方法,在源项目中找出与目标项目中少量有标签实例分布相近的实例,并建立缺陷预测模型。以F1作为评价指标,与基于特征聚类和TrAdaBoost的跨项目软件缺陷预测(FeCTrA)方法以及基于多核集成学习的跨项目软件缺陷预测(CMKEL)方法相比,CPDP-FSTr的预测性能在AEEEM数据集上分别提高了5.84%、105.42%,在NASA数据集上分别提高了5.25%、85.97%,且其两过程特征选择优于单一特征选择过程。实验结果表明,当源项目特征选择比例和目标项目有类标实例比例分别为60%、20%时,所提CPDP-FSTr能取得较好的预测性能。 Cross-project software defect prediction can solve the problem of few training data in prediction projects.However,the source project and the target project usually have the large distribution difference,which reduces the prediction performance.In order to solve the problem,a new Cross-Project Defect Prediction method based on Feature Selection and TrAdaBoost(CPDP-FSTr)was proposed.Firstly,in the feature selection stage,Kernel Principal Component Analysis(KPCA)was used to delete redundant data in the source project.Then,according to the attribute feature distribution of the source project and the target project,the candidate source project data closest to the target project distribution were selected according to the distance.Finally,in the instance transfer stage,the TrAdaBoost method improved by the evaluation factor was used to find out the instances in the source project which were similar to the distribution of a few labeled instances in the target project,and establish a defect prediction model.Using F1 as the evaluation index,compared with the methods such as cross-project software defect prediction using Feature Clustering and TrAdaBoost(FeCTrA),Cross-project software defect prediction based on Multiple Kernel Ensemble Learning(CMKEL),the proposed CPDP-FSTr had the prediction performance improved by 5.84%and 105.42%respectively on AEEEM dataset,enhanced by 5.25%and 85.97%respectively on NASA dataset,and its two-process feature selection is better than the single feature selection process.Experimental results show that the proposed CPDP-FSTr can achieve better prediction performance when the source project feature selection proportion and the target project labeled instance proportion are 60%and 20%respectively.
作者 李莉 石可欣 任振康 LI Li;SHI Kexin;REN Zhenkang(College of Information and Computer Engineering,Northeast Forestry University,Harbin Heilongjiang 150040,China)
出处 《计算机应用》 CSCD 北大核心 2022年第5期1554-1562,共9页 journal of Computer Applications
关键词 跨项目缺陷预测 特征选择 核主成分分析 实例迁移 TrAdaBoost cross-project defect prediction feature selection Kernel Principal Component Analysis(KPCA) instance transfer TrAdaBoost
  • 相关文献

参考文献5

二级参考文献30

  • 1景涛,江昌海,胡德斌,白成刚,蔡开元.软件关联缺陷的一种检测方法[J].软件学报,2005,16(1):17-28. 被引量:23
  • 2王华秋,曹长修.并行混沌粒子群优化研究及应用[J].计算机仿真,2005,22(11):98-101. 被引量:10
  • 3K E Emam, O Laitenberger. Evaluating capture-recapture models with two inspectors[ J]. IEEE Transactions on Software Engineer- ing, 2001,27(9) :851-864.
  • 4B Turhan, A Bener. A Multivariate Analysis of Static Code Attrib- utes for Defect Prediction [ C ]. Seventh International Conference on Quality Software, 2007: 231-237.
  • 5L C Briand, et al. A comprehensive evaluation of capture-recapture models for estimating software defect content [ J ]. IEEE Transac- tions on Software Engineering, 2000,26 (6) :518-540.
  • 6T Gyimothy, R Ference, L Siket. Empirical Validation of Object- Oriented Metrics on Open Source Software for Fault Prediction[ J]. IEEE Trans on Software Engineering, 2005,31 (10) :897-910.
  • 7F Provost, T Fawcett. Robust Classification for Imprecise Environ- ments [ J ]. Machine Learning, 2001,42 ( 3 ) : 203 - 231.
  • 8王青,伍书剑,李明树.软件缺陷预测技术[J].软件学报,2008,19(7):1565-1580. 被引量:149
  • 9石剑飞,杨欣,秦玮,闫怀志.一种软件缺陷预测改进模型的研究[J].北京理工大学学报,2010,30(9):1074-1076. 被引量:1
  • 10李乔,郑啸.云计算研究现状综述[J].计算机科学,2011,38(4):32-37. 被引量:432

共引文献94

同被引文献46

引证文献4

二级引证文献5

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部