摘要
为了解决因软件缺陷数据存在数据不平衡问题限制了分类器的性能,将POSS(pareto optimization for subset selection)特征选择算法和随机欠采样技术引入到软件缺陷检测中,并利用支持向量机(support vector machine,SVM)构建预测模型。试验结果表明,通过多次随机欠采样可以有效地解决软件缺陷数据不平衡问题,同时使用POSS方法对目标子集进行双向优化,从而提高分类的准确率,其结果要优于Relief、Fisher、M I(mutual information)特征选择算法。
In order to solve the problem of imbalance distribution in software defect prediction,POSS( pareto optimization for subset selection) feature selection and random undersampling was applied in this paper,and SVMwas used to build the prediction model. The experimental results showed that the problem could be solved effectively by using multiple random undersampling,and the POSS method was treated subset selection as a bi-objective optimization,which could improve the accuracy of classification,the effectiveness of proposed method was verified by comparing with Relief、Fisher、MI( mutual information).
出处
《山东大学学报(工学版)》
CAS
北大核心
2017年第1期15-21,共7页
Journal of Shandong University(Engineering Science)
基金
江苏省自然科学基金资助项目(BK20131378
BK20140885)
广西高校云计算与复杂系统重点实验室资助项目(15206)
关键词
软件缺陷检测
不平衡性
数据采样
特征选择
software defect prediction
class imbalance
data sampling
feature selection