期刊文献+

SBFS:基于搜索的软件缺陷预测特征选择框架 被引量:6

SBFS: search based feature selection framework for software defect prediction
下载PDF
导出
摘要 在搜集缺陷预测数据集的时候,由于考虑了大量与代码复杂度或开发过程相关的度量元,造成数据集内存在维数灾难的问题。借助基于搜索的软件工程思想,提出一种新颖的基于搜索的包裹式特征选择框架SBFS。该框架在实现时,首先借助SMOTE方法来缓解数据集内存在的类不平衡问题,随后借助基于遗传算法的特征选择方法,基于训练集选出最优特征子集。在实证研究中,以NASA数据集作为评测对象,以基于前向选择策略的包裹式特征选择方法 FW、基于后向选择策略的包裹式特征选择方法 BW、不进行特征选择的方法 Origin作为基准方法。最终实证研究结果表明:SBFS方法在90%的情况下,不差于Origin法;在82.3%的情况下,不差于BW法;在69.3%的情况下,不差于FW法。除此之外,若基于决策树分类器,则应用SMOTE方法后,可以在71%的情况下提高模型性能;而基于朴素贝叶斯和Logistic回归分类器,则应用SMOTE方法后,仅可以在47%和43%的情况下提高模型的预测性能。 During the process of gathering defect prediction datasets, the issue of curse of dimensionality may exist in these datasets when considering different metrics based on code complexity or development process. Motivated by the idea of search based software engineering, this paper proposed a novel search based wrapper feature selection framework SBFS. In implemen- ting this framework, it first used SMOTE approach to alleviate the issue of class imbalance, then used a genetic algorithm based feature selection method to select the optimal feature subset based on the training set. In empirical studies, it used NASA dataset as the subjects. Then it chose some classical baseline methods, such as forward search based wrapper feature selection method FW, backward search based wrapper feature selection method BW, and no feature selection method Origin. Finally results show that SBFS is no worse than Origin in 90% of cases,is no worse than BW in 82.3% of cases,and is no worse than FW in 69.3% of cases. Furthermore,when using decision tree classifier,using SMOTE can improve the model performance in 71% of cases. However when using Naive Bayes classifier or Logistic regression classifier, using SMOTE can only improve the model performance in 47% and 43% of cases respectively.
出处 《计算机应用研究》 CSCD 北大核心 2017年第4期1105-1108,1119,共5页 Application Research of Computers
基金 国家自然科学基金资助项目(61202006) 江苏省大学生创新训练计划资助项目(201610304090X)
关键词 软件缺陷预测 特征选择 基于搜索的软件工程 类不平衡学习 software defect prediction feature selection search based software engineering class imbalance learning
  • 相关文献

参考文献3

二级参考文献259

共引文献248

同被引文献21

引证文献6

二级引证文献10

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部