摘要
针对半监督软件缺陷预测中的类不平衡以及特征中含有过多无关特征和冗余特征的问题,提出一种改进的半监督集成软件缺陷预测方法 Fe SSTri(semi-supervised software prediction using Feature Selecting and Sample and Tri-training).首先使用ADASYN自适应综合过采样算法对部分标记样本进行采样,来解决数据集类不平衡问题;其次利用采样后的数据构建分类器,给未标记数据做预标记,将标记样本与预标记样本结合,使用最小冗余最大相关mRMR算法对数据集进行特征选择,解决无关特征过多和特征冗余问题,最后使用半监督集成算法Tri-training构建最终的半监督缺陷预测模型.本文在NASA数据集和AEEEM数据集上以F1值为评测指标对提出的模型进行了验证.实验结果表明:Fe SSTri方法要优于初始的Tri-training算法,并且与经典的机器学习方法相比,Fe SSTri方法均可以取得更好的预测结果.
Aiming at the problem of highly class unbalance of defect datasets and too many irrelevant and redundant features in the process of semi-supervised software defect prediction,an improved semi-supervisedensemble software defect prediction method Fe SSTri(semi-supervised software prediction using Feature reference and Sample and tri-training) is proposed.Firstly,ADASYN adaptive comprehensive over-sampling methodis used to sample some labeled samples to solve the problem of datasets class imbalance.Secondly,the sampled data is used to construct a classifier and pre-label the unlabeled data.The labeled samples are combined with the pre-labeled samples,and the minimum redundancy maximum correlation mRMR method is used to perform feature selection on the data sets,Thismethodcan also solve the feature redundancy problem.Finally,the semi-supervised ensembleTri-training method is used to obtain the final prediction results.This paper verifies the proposed model on the NASA data sets and AEEEMdata sets with F1 values as the evaluation index.The experimental results showthat the Fe SSTri method is better than the initial Tri-training algorithm,and compared with the classic machine learning method,the Fe SSTri method can achieve better prediction results.
作者
周建含
李英梅
李文昊
ZHOU Jian-han;LI Ying-mei;LI Wen-hao(School of Computer Science and Information Engineering,Harbin Normal University,Harbin 150000,China)
出处
《小型微型计算机系统》
CSCD
北大核心
2021年第10期2196-2202,共7页
Journal of Chinese Computer Systems
基金
黑龙江省自然科学基金项目(F2017021)资助
哈尔滨师范大学计算机学院科研项目(JKYKYY202003)资助
哈尔滨师范大学研究生创新科研项目(HSDSSCX2020-58)资助。
关键词
软件缺陷预测
类不平衡
特征选择
半监督预测
机器学习
software defect prediction
class imbalance
feature selection
semi-supervised prediction
machine learning