摘要
跨项目软件缺陷预测可以解决预测项目中训练数据较少的问题,然而源项目和目标项目通常会有较大的数据分布差异,这降低了预测性能。针对该问题,提出了一种基于特征选择和TrAdaBoost的跨项目缺陷预测方法(CPDP-FSTr)。首先,在特征选择阶段,采用核主成分分析法(KPCA)删除源项目中的冗余数据;然后,根据源项目和目标项目的属性特征分布,按距离选出与目标项目分布最接近的候选源项目数据;最后,在实例迁移阶段,通过采用评估因子改进的TrAdaBoost方法,在源项目中找出与目标项目中少量有标签实例分布相近的实例,并建立缺陷预测模型。以F1作为评价指标,与基于特征聚类和TrAdaBoost的跨项目软件缺陷预测(FeCTrA)方法以及基于多核集成学习的跨项目软件缺陷预测(CMKEL)方法相比,CPDP-FSTr的预测性能在AEEEM数据集上分别提高了5.84%、105.42%,在NASA数据集上分别提高了5.25%、85.97%,且其两过程特征选择优于单一特征选择过程。实验结果表明,当源项目特征选择比例和目标项目有类标实例比例分别为60%、20%时,所提CPDP-FSTr能取得较好的预测性能。
Cross-project software defect prediction can solve the problem of few training data in prediction projects.However,the source project and the target project usually have the large distribution difference,which reduces the prediction performance.In order to solve the problem,a new Cross-Project Defect Prediction method based on Feature Selection and TrAdaBoost(CPDP-FSTr)was proposed.Firstly,in the feature selection stage,Kernel Principal Component Analysis(KPCA)was used to delete redundant data in the source project.Then,according to the attribute feature distribution of the source project and the target project,the candidate source project data closest to the target project distribution were selected according to the distance.Finally,in the instance transfer stage,the TrAdaBoost method improved by the evaluation factor was used to find out the instances in the source project which were similar to the distribution of a few labeled instances in the target project,and establish a defect prediction model.Using F1 as the evaluation index,compared with the methods such as cross-project software defect prediction using Feature Clustering and TrAdaBoost(FeCTrA),Cross-project software defect prediction based on Multiple Kernel Ensemble Learning(CMKEL),the proposed CPDP-FSTr had the prediction performance improved by 5.84%and 105.42%respectively on AEEEM dataset,enhanced by 5.25%and 85.97%respectively on NASA dataset,and its two-process feature selection is better than the single feature selection process.Experimental results show that the proposed CPDP-FSTr can achieve better prediction performance when the source project feature selection proportion and the target project labeled instance proportion are 60%and 20%respectively.
作者
李莉
石可欣
任振康
LI Li;SHI Kexin;REN Zhenkang(College of Information and Computer Engineering,Northeast Forestry University,Harbin Heilongjiang 150040,China)
出处
《计算机应用》
CSCD
北大核心
2022年第5期1554-1562,共9页
journal of Computer Applications
关键词
跨项目缺陷预测
特征选择
核主成分分析
实例迁移
TrAdaBoost
cross-project defect prediction
feature selection
Kernel Principal Component Analysis(KPCA)
instance transfer
TrAdaBoost