期刊文献+

基于PU学习的软件故障检测研究 被引量:1

Research of software fault prediction based on PU learning
下载PDF
导出
摘要 针对软件故障数据中正例样本相对较少且大量样本标注困难的现实场景,已知未标注样本中包含用于建立故障检测模型的大量有用信息,提出仅用正例和未标注数据构建分类模型对软件开发过程中的故障进行检测的半监督学习方法。首先采用合成少数类过采样SMOTE算法对数据集中的正例样本进行过采样,平衡数据集中的类分布。在此基础上合理构建正例集合和未标注集合,采用POSC 4.5和Bagging算法构建软件故障决策树集成分类器。通过对NASA MDP数据库中的12个数据集进行对比实验,结果表明,仅用正例和未标注数据建模可以得到与有监督学习方法相近的软件故障检测率,且集成分类器方法比单分类器方法具有更高的检测率,未标注样本集大小的软件故障检测率同样有影响。 The software fault datasets were highly possible that there were only a small set of labeled positive data and most of the data was hard to be labeled, which contained a great deal of useful information for building a prediction model for software fault detection. This paper proposed a semi-supervised classification model to predict the faults only using the positive and unlabeled data during the software development process, The proposed method firstly used the SMOTE ( synthetic minority oversampling technique) method to balance the class distribution by oversampling on the rare positive dataset. Then partitioned the improved dataset into positive subset and unlabeled subset properly. Third used the POSC 4.5 algorithm and Bagging algorittnn to build a decision tree classification ensemble model for software fault prediction using these subsets. The experiments were conducted on 12 datasets from the NASA MDP database. The experiment results show that the fault detection rate based on positive and unlabeled learning is close to the supervised learning method. The ensemble classifier method can effectively improve detective performance than a single classifier method, and the unlabeled level can effect the fault detection somehow.
出处 《计算机应用研究》 CSCD 北大核心 2015年第11期3324-3327,3331,共5页 Application Research of Computers
基金 国家自然科学基金资助项目(61303125)
关键词 软件故障检测 正例和未标注学习 不平衡数据 决策树 集成分类器 software fault prediction PU learning unbalanced data decision tree ensemble classifier
  • 相关文献

参考文献27

  • 1Guo Lan, Ma Yan, Cukic B, et al. Robust prediction of fault-proneness by random forests E C ]//Proc of the 15 th International Symposium on Software Reliability Engineering. [ S. 1. ] : IEEE Press, 2004 : 417- 428.
  • 2Gondra I. Appi[ying machine learning to software fault-proneness pre- diction[J]. Journal of Systems and Software,2008,81 (2) :186- 195.
  • 3Li Ming, Zhang Hongyu, Wu Rongxin, et al. Sample-based software de- fect prediction with active and semi-supervised learning [ J ]. Automa- ted Software Engineering,2012,19(2) :201-230.
  • 4Lu Huihua, Cnkic B, Culp M. An iterative semi-supervised approach to software fau]tt prediction[ C ]//Proc of the 7th International Confe- rence on Predictive Models in Software Engineering. [ S. 1. ] : ACM Press,2011.
  • 5Letouzey F, Denis F, Gilleron R. Learning from positive and unlabeled examples[ C ]//Proc of the 1 l tth International Conference on Algo- rithmic Learning Theory. [ S. 1. ] :Springer,2000:71-85.
  • 6Chaula N V, Bowyer K W, Hall L O,et al. SMOTE:synthetic minority over-sampling 1Leehnique[ J]. Journal of Artificial Intelligence Re- search,2002,16(3) :321-357.
  • 7Breiman L. Bagging predictors [ J ]. Machine Learning, 1996,24 (2) :123-140.
  • 8Nasa/Wvu IV&V facility, metrics data program, available from [ EB/ OL]. (2007). http://mdp, ivv. nasa. gov.
  • 9Li Rengqing, Wang Shihai. An empirical study for software fault- proneness prediction with ensemble learning models on imbalanced data sets [ J ]. Journal of Software,2014,9 (3) :697-704.
  • 10Catal C. Software fault prediction:a literature review and current trends [J]. Expert Systems with Applications,2011,38(4) :4626-4636.

二级参考文献136

共引文献82

同被引文献1

引证文献1

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部