摘要
针对软件缺陷预测时普遍存在的样本缺陷数据不平衡、特征冗余等问题,引进稀疏自编码(SAE)神经网络并加以改进,提出了一种新的分类模型。模型结合了SAE神经网络和少数样本合成过采样技术(SMOTE)的优点,可弥补传统分类方法在软件缺陷预测时忽视少数类分类效果、不能很好地保留数据内部特征等不足。基于NASA软件缺陷公共数据库中多个数据集的实验结果表明:提出的模型在软件缺陷预测方面的分类效果明显优于其他算法,尤其提高了不平衡数据集中少数类的分类精度。
In view of common problems such as data imbalance and feature redundancy in software defect prediction,the sparse auto-encoder(SAE) neural network is introduced and improved,a new classification model is proposed. Combining with advantages of SAE neural network and synthetic minority over-sampling technique (SMOTE),the model can make up for the shortcomings of the traditional classification methods such as ignoring minority categories effects and not able to retain inner feature of data. The result of experiments base on some databases of NASA Metrics Data repository shows that the classification effect of proposed model is superior to other traditional algorithms in software defects prediction,especially improve the minority category classification precision of imbalanced datasets.
作者
徐海涛
高莹
苏娜
XU Hai-tao;GAO Ying;SU Na(School of Computer,Hangzhou Dianzi University,Hangzhou 310018,China)
出处
《传感器与微系统》
CSCD
2019年第2期49-51,62,共4页
Transducer and Microsystem Technologies
基金
国家自然科学基金资助项目(61572165)
关键词
过采样
稀疏自编码
神经网络
软件缺陷预测
不平衡
over-sampling
sparse auto-encoder (SAE)
neural network
software defect prediction
imbalanced