摘要
近年来,使用机器学习方法来预测软件缺陷得到了广泛的关注。在实际工程中,软件缺陷特征的构造需要相关领域知识及大量时间,使得到的软件特征一般较少。并且,有缺陷的软件样本会大大少于无缺陷的软件样本,造成样本的高度不平衡。这里,通过显式的特征构造方法,把有限的原特征映射到高维度的特征空间;通过改进的Bagging以及随机特征子空间的方法,在得到类平衡的训练样本集的同时,提高模型的泛化能力。通过上述方法,得到一系列弱分类器。最后,使用一个简单的线性分类器训练得到各个弱分类器的权重来融合所有弱分类器,得到更好的分类效果。
In recent years, software defect prediction using machine learning methods have attracted a great deal of attention. In real world engineering, obtaining software features is difficult for constructing features of needing domain knowledge and a lot of time. And the num- ber of defective software sample will be much less than the samples without defects,leading to highly unbalanced. In this paper,through explicit feature structuring method, the limited original features are mapped to high dimension feature space. By improved Bagging and random feature subspace method, also get class balanced training sample set and improve the model generalization ability at the same time. Through the above method, get a series of weak classifiers. Finally, using a simple linear classifier, train the weights of the weak classifiers to obtain better classification result.
出处
《计算机技术与发展》
2015年第10期63-66,共4页
Computer Technology and Development
基金
国家自然科学基金资助项目(61272273)
南京邮电大学校科研项目(XJKY14016)
江苏省333工程项目(BRA2011175)