期刊文献+

软件缺陷预测中基于排序集成的特征选择方法 被引量:4

Feature Selection Method Based on Sorting Integration in Software Defect Prediction
下载PDF
导出
摘要 在软件缺陷预测中,缺陷数据集中往往存在冗余或不相关特征,需要对数据集进行特征选择.为了避免软件缺陷预测中常见的基于排序的特征选择方法的不稳定性,提出一种基于排序集成的特征选择方法.首先,分别执行相关系数、信息增益率和Relief F三种特征选择方法,得到特征排序序列,赋予每个特征一个权重,随后,将三种方法得到的每个特征的权重相加求和,作为该特征的总权重.最后,根据特征总权重对特征从高到低进行排序,并按照特征百分比从前往后依次选取特征.在实证研究中,以NASA的11个数据集为实验对象,采用逻辑回归算法构建预测模型,并采用AUC指标度量不同预测模型的分类性能.实验结果验证了基于排序集成的特征选择方法的有效性. There are often redundant or irrelevant features in defect data sets in the field of software defect prediction,and feature selection is required. A feature selection method based on sorting integration is proposed which can avoid the instability of the common sorting feature selection methods. Firstly,Correlation,GainRatio and ReliefF are used respectively to obtain the feature sorting sequence based on which each feature can obtain a weight. Subsequently,the weights of each feature obtained by the three methods are added up to obtain the total weight of the feature. Finally,the features are sorted from high to low according to the total weights of the features,and the features are selected according to the percentage of the feature. In the empirical study,11 data sets of NASA were used as experimental objects,and Logic Regression algorithm was used to construct the prediction model. Moreover,AUC metric was used to measure the classification performance of different prediction models. The experimental results show the effectiveness of the feature selection method based on sorting integration.
作者 姜丽 姜淑娟 于巧 JIANG Li;JIANG Shu-juan;YU Qiao(School of Computer Science and Technology, China University of Mining and Technology ,Xuzhou 221116, China;School of Computer Science and Technology, Jiangsu Normal University, Xuzhou 221116, China)
出处 《小型微型计算机系统》 CSCD 北大核心 2018年第7期1410-1414,共5页 Journal of Chinese Computer Systems
基金 国家自然科学基金项目(61673384 61502497)资助
关键词 软件缺陷预测 特征选择 特征权重 排序集成 software defect prediction feature selection feature weight sorting integration
  • 相关文献

参考文献3

二级参考文献56

  • 1闫明松,周志华.代价敏感分类算法的实验比较[J].模式识别与人工智能,2005,18(5):628-635. 被引量:14
  • 2Wang Q, Wu S J, Li M S. Software defect prediction. J Softw, 2008, 19:1565-1580.
  • 3Hall T, Beecham S, Bowes D, et al. A systematic literature review on fault prediction performance in software engineering. IEEE Trans Softw Eng, 2012, 38:1276-1304.
  • 4Yu S S, Zhou S G, Guan J H. Software engineering data mining: a survey. J Front Comput Sci Tech, 2012, 6:1-31.
  • 5Chen X, Gu Q, Liu W S, et al. Survey of static software defect prediction. J Softw, 2016, 1:1-25.
  • 6Ghotra B, McIntosh S, Hassan A E. Revisiting the impact of classification techniques on the performance of defect prediction models. In: Proceedings of the International Conference on Software Engineering, Firenze, 2015. 789 -800.
  • 7Peters F, Menzies T, Layman L. LACE2: better privacy-preserving data sharing for cross project defect prediction. In: Proceedings of the International Conference on Software Engineering, Firenze, 2015. 801-811.
  • 8Tantithamthavorn C, McIntosh S, Hassan A E, et al. The impact of mislabelling on the performance and interpretation of defect prediction models. In: Proceedings of the International Conference on Software Engineering, Firenze, 2015. 812-823.
  • 9Jing X Y, Wu F, Dong X W, et M. Heterogeneous cross-company defect prediction by unified metric representation and CCA-based transfer learning. In: Proceedings of the International Symposium on Foundations of Software Engineering, Bergamo, 2015. 496-507.
  • 10Nam J, Kim S. Heterogeneous defect prediction. In: Proceedings of the International Symposium on Foundations of Software Engineering, Bergamo, 2015. 508-519.

共引文献53

同被引文献34

引证文献4

二级引证文献4

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部