期刊文献+

一种面向软件缺陷预测的可容忍噪声的特征选择框架 被引量:17

A Noise Tolerable Feature Selection Framework for Software Defect Prediction
下载PDF
导出
摘要 软件缺陷预测通过挖掘软件历史仓库,构建缺陷预测模型来预测出被测项目内的潜在缺陷程序模块.但在挖掘过程中,对程序模块进行类型标记或软件度量时均可能产生噪声.虽然研究人员对已有特征选择方法的噪声容忍能力进行了分析,但据我们所知,很少有研究人员在软件缺陷预测研究中,针对性的设计出可容忍噪声的新颖特征选择方法.为了解决此问题,我们提出一种可容忍噪声的特征选择框架FECS.具体来说,首先借助聚类分析,将原始特征集划分到指定数目的簇中,随后设计出3种不同的启发式特征选择策略,依次从每一个簇中选出最为典型的特征.在实证研究中,以Eclipse和NASA等实际项目为评测对象.首先借助一系列数据预处理方法来提升数据集质量,随后同时注入类标噪声和特征噪声来模拟噪声数据集.通过与典型的特征选择方法进行比较,验证了FECS框架的有效性,除此之外,通过深入分析噪声注入率、特征选择比例及噪声类型对缺陷预测性能的影响,为更有效的使用FECS提供了指导. Software defect prediction constructs a software defect prediction model based on the mining of software historical repositories. Then it uses the trained model to predict potential defect-proneness program modules. However noises are inevitable when labeling or measuring the software entities. Although some researchers have investigated the noise tolerance of existing feature selection methods, few studies focus on proposing new feature selection methods with a certain noise tolerance. To solve this issue, we propose a novel framework FECS (FEature Clustering with Selection strategies). In particular, FECS first cluster original features into specified number of clusters based on cluster analysis. Then it selects a most typical feature from each cluster based on our proposed three heuristic feature selection strategies. During empirical studies, we choose real-world software projects, such as Eclipse and NASA. We first perform a set of data preprocessing steps to improve the quality of these datasets. We then inject class level and feature level noises simultaneously to imitate noisy datasets. After using classical featureselection methods as the baseline, we confirm the effectiveness of FECS and provide a guideline of using FECS after analyzing the effects of varying either percentage of selected features or the noise injection rates, and different noise types
出处 《计算机学报》 EI CSCD 北大核心 2018年第3期506-520,共15页 Chinese Journal of Computers
基金 国家自然科学基金(61373012 61202006 61321491 91218302) 欧盟项目(612212) 南京大学计算机软件新技术国家重点实验室开放课题(KFKT2016B18) 南京大学优秀博士研究生创新能力提升计划和软件新技术与产业化协同创新中心部分资助
关键词 软件质量保证 软件缺陷预测 特征选择 噪声容忍能力 聚类分析 software quality assurance software defect prediction feature selection noise tolerableability cluster analysis
  • 相关文献

参考文献3

二级参考文献259

共引文献248

同被引文献91

引证文献17

二级引证文献82

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部