摘要
针对软件缺陷预测中的样本集数量少和分布不对称问题,提出一种基于均衡有偏支持向量机的软件缺陷预测方法。该方法通过标记样本集和未标记样本集进行半监督学习,在少量非对称的标记样本集上,利用有偏支持向量机进行泛化学习。在半监督学习的迭代过程中,采用重采样策略平衡样本集以消除大量不对称的未标记样本集对软件缺陷预测的性能影响。在基准数据集上的实验结果表明,该方法能够有效地对类别不均衡的样本集进行软件缺陷预测。
There are two important issues in software defect prediction.It is difficult to collect a large amount of labeled training data to learn a good model.The data set is always imbalanced,since the software system contains much fewer defective modules than non-defective modules.In order to solve out these two problems,this paper proposes a novel semi-supervised learning approach named Balanced and Biased Support Vector Machine(B2SVM).The method exploits the abundant unlabeled samples to improve the prediction accuracy,as well as employs sampling technology to handle the class-imbalance problem during the Biased Support Vector Machine(BSVM) learning process.Experimental results on class-imbalance dataset show that this method can go on software defect prediction for class imbalance sample set.
出处
《计算机工程》
CAS
CSCD
2013年第8期87-91,共5页
Computer Engineering
关键词
机器学习
半监督学习
软件缺陷预测
有偏支持向量机
重采样
machine learning
semi-supervised learning
software defect prediction
Biased Support Vector Machine(BSVM)
resampling