期刊文献+

Software Defect Detection with ROCUS 被引量:10

Software Defect Detection with ROCUS
原文传递
导出
摘要 Software defect detection aims to automatically identify defective software modules for efficient software test in order to improve the quality of a software system. Although many machine learning methods have been successfully applied to the task, most of them fail to consider two practical yet important issues in software defect detection. First, it is rather difficult to collect a large amount of labeled training data for learning a well-performing model; second, in a software system there are usually much fewer defective modules than defect-free modules, so learning would have to be conducted over an imbalanced data set. In this paper~ we address these two practical issues simultaneously by proposing a novel semi-supervised learning approach named Rocus. This method exploits the abundant unlabeled examples to improve the detection accuracy, as well as employs under-sampling to tackle the class-imbalance problem in the learning process. Experimental results of real-world software defect detection tasks show that Rocus is effective for software defect detection. Its performance is better than a semi-supervised learning method that ignores the class-imbalance nature of the task and a class-imbalance learning method that does not make effective use of unlabeled data. Software defect detection aims to automatically identify defective software modules for efficient software test in order to improve the quality of a software system. Although many machine learning methods have been successfully applied to the task, most of them fail to consider two practical yet important issues in software defect detection. First, it is rather difficult to collect a large amount of labeled training data for learning a well-performing model; second, in a software system there are usually much fewer defective modules than defect-free modules, so learning would have to be conducted over an imbalanced data set. In this paper~ we address these two practical issues simultaneously by proposing a novel semi-supervised learning approach named Rocus. This method exploits the abundant unlabeled examples to improve the detection accuracy, as well as employs under-sampling to tackle the class-imbalance problem in the learning process. Experimental results of real-world software defect detection tasks show that Rocus is effective for software defect detection. Its performance is better than a semi-supervised learning method that ignores the class-imbalance nature of the task and a class-imbalance learning method that does not make effective use of unlabeled data.
出处 《Journal of Computer Science & Technology》 SCIE EI CSCD 2011年第2期328-342,共15页 计算机科学技术学报(英文版)
基金 supported by the National Natural Science Foundation of China under Grant Nos. 60975043,60903103,and 60721002
关键词 machine learning data mining semi-supervised learning class-imbalance software defect detection machine learning, data mining, semi-supervised learning, class-imbalance, software defect detection
  • 相关文献

参考文献58

  • 1Dai Y S, Xie M, Long Q, Ng S H. Uncertainty analysis in software reliability modeling by Bayesian approach with maximum-entropy principle. IEEE Transactions on Software Engineering,2007,33(11):781-795.
  • 2Guo L, Ma Y, Cukic B, Singh H. Robust prediction of fault-proneness by random forests. In Proc. the 15th International Symposium on Software Reliability Engineering, Nov.2-5,2004,pp.417-428.
  • 3Khoshgoftaar T M, Allen E B, Jones W D, Hudepohl J P. Classification-tree models of software-quality over multiple re-leases. IEEE Transactions on Reliability,2000,49(1):4-11.
  • 4Lessmann S, Baesens B, Mues C, Pietsch S. Benchmarking classification models for software defect prediction: A pro-posed framework and novel findings. IEEE Transactions on Software Engineering,2008,34(4):485-496.
  • 5Menzies T, Greenwald J, Frank A. Data mining static code attributes to learn defect predictors. IEEE Transactions on Software Engineering, 2007,33(1):2-13.
  • 6Zhang H, Zhang X. Comments on “Data mining static code attributes to learn defect predictors”. IEEE Transactions on Software Engineering,2007,33(9):635-637.
  • 7Zhou Y, Leung H. Empirical analysis of object-oriented de-sign metrics for predicting high and low severity faults. IEEE Transactions on Software Engineering,2006,32(10):771-789.
  • 8Seliya N, Khoshgoftaar T M. Software quality estimation with limited fault data: A semi-supervised learning perspective. Software Quality Journal,2007,15:327-344.
  • 9Pelayo L, Dick S. Applying novel resampling strategies to soft-ware defect prediction. In Proc. the 2007 Annual Meeting of the North American Fuzzy Information Processing Society, San Diego, USA, Jun.24-27,2007,pp.69-72.
  • 10Zhou Z H, Li M. Semi-supervised learning by disagreement. Knowledge and Information Systems,2010,24(3):415-439.

同被引文献78

  • 1赵亮,侯金宝.文件和包层次的软件缺陷预测[J].清华大学学报(自然科学版),2011,51(S1):1472-1476. 被引量:2
  • 2鲁宏伟,魏凯,孔华锋.一种改进的KMP高效模式匹配算法[J].华中科技大学学报(自然科学版),2006,34(10):41-43. 被引量:26
  • 3CATAL C. Software fault prediction: a literature review and current trends[ J]. Expert Systems with Applications: An International Journal,2011,38(4) :4626-4636.
  • 4LIU Yi, KHOSHGOFTAAR T M, SELIYA N. Evolutionary optimiza- tion of software quality modeling with multiple repositories[ J]. IEEE Trans on Software Enclineering,2010,36(6) :852-864.
  • 5MENZIES T, MILTON Z, TURHAN B, et al. Defect prediction from static code features : current results, limitations, new approaches [ J ]. Automated Software Engineering ,2010,17 (4) :375-407.
  • 6AUBRAMANYAN R, KRISHNAN M S. Empirical analysis of CK metrics for object-oriented design complexity : implications for software defects[ J ]. IEEE Trans on Software Engineering,2003,29 (4) : 297-310.
  • 7GYIMOTHY T, RERENC R, SIKET I. Empirical validation of ob- ject-oriented metrics on open source software for fault prediction [ J ]. IEEE Trans on Software Engineering ,2005,31 (10) :897-910.
  • 8KHOSHGOFTAAR T M, SELIYA N. The necessity of assuring quali- ty in software measurement data[ C ]// Proc of the 10th International Conference on Software Metrics. 2004 : 119-130.
  • 9MENZIES T, GREENWALD J, FRANK A. Data mining static code attributes to learn defect predictors [ J ]. IEEE Trans on Software Engineering ,2007,32( 11 ) :2-13.
  • 10SHEPPED M, INCE D. A critique of three metrics[ J]. Journal of Systems and Software, 1994,26 ( 3 ) : 197- 210.

引证文献10

二级引证文献22

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部