Software Defect Detection with ROCUS 被引量：10

Software Defect Detection with ROCUS

导出

摘要 Software defect detection aims to automatically identify defective software modules for efficient software test in order to improve the quality of a software system. Although many machine learning methods have been successfully applied to the task, most of them fail to consider two practical yet important issues in software defect detection. First, it is rather difficult to collect a large amount of labeled training data for learning a well-performing model; second, in a software system there are usually much fewer defective modules than defect-free modules, so learning would have to be conducted over an imbalanced data set. In this paper~ we address these two practical issues simultaneously by proposing a novel semi-supervised learning approach named Rocus. This method exploits the abundant unlabeled examples to improve the detection accuracy, as well as employs under-sampling to tackle the class-imbalance problem in the learning process. Experimental results of real-world software defect detection tasks show that Rocus is effective for software defect detection. Its performance is better than a semi-supervised learning method that ignores the class-imbalance nature of the task and a class-imbalance learning method that does not make effective use of unlabeled data. Software defect detection aims to automatically identify defective software modules for efficient software test in order to improve the quality of a software system. Although many machine learning methods have been successfully applied to the task, most of them fail to consider two practical yet important issues in software defect detection. First, it is rather difficult to collect a large amount of labeled training data for learning a well-performing model; second, in a software system there are usually much fewer defective modules than defect-free modules, so learning would have to be conducted over an imbalanced data set. In this paper~ we address these two practical issues simultaneously by proposing a novel semi-supervised learning approach named Rocus. This method exploits the abundant unlabeled examples to improve the detection accuracy, as well as employs under-sampling to tackle the class-imbalance problem in the learning process. Experimental results of real-world software defect detection tasks show that Rocus is effective for software defect detection. Its performance is better than a semi-supervised learning method that ignores the class-imbalance nature of the task and a class-imbalance learning method that does not make effective use of unlabeled data.

作者姜远黎铭周志华

机构地区 National Key Laboratory for Novel Software Technology

出处《Journal of Computer Science & Technology》 SCIE EI CSCD 2011年第2期328-342,共15页 计算机科学技术学报（英文版）

基金 supported by the National Natural Science Foundation of China under Grant Nos. 60975043,60903103,and 60721002

关键词 machine learning data mining semi-supervised learning class-imbalance software defect detection machine learning, data mining, semi-supervised learning, class-imbalance, software defect detection

分类号 TP311.53 [自动化与计算机技术—计算机软件与理论]

引文网络
相关文献

参考文献58

1Dai Y S, Xie M, Long Q, Ng S H. Uncertainty analysis in software reliability modeling by Bayesian approach with maximum-entropy principle. IEEE Transactions on Software Engineering,2007,33(11):781-795.
2Guo L, Ma Y, Cukic B, Singh H. Robust prediction of fault-proneness by random forests. In Proc. the 15th International Symposium on Software Reliability Engineering, Nov.2-5,2004,pp.417-428.
3Khoshgoftaar T M, Allen E B, Jones W D, Hudepohl J P. Classification-tree models of software-quality over multiple re-leases. IEEE Transactions on Reliability,2000,49(1):4-11.
4Lessmann S, Baesens B, Mues C, Pietsch S. Benchmarking classification models for software defect prediction: A pro-posed framework and novel findings. IEEE Transactions on Software Engineering,2008,34(4):485-496.
5Menzies T, Greenwald J, Frank A. Data mining static code attributes to learn defect predictors. IEEE Transactions on Software Engineering, 2007,33(1):2-13.
6Zhang H, Zhang X. Comments on “Data mining static code attributes to learn defect predictors”. IEEE Transactions on Software Engineering,2007,33(9):635-637.
7Zhou Y, Leung H. Empirical analysis of object-oriented de-sign metrics for predicting high and low severity faults. IEEE Transactions on Software Engineering,2006,32(10):771-789.
8Seliya N, Khoshgoftaar T M. Software quality estimation with limited fault data: A semi-supervised learning perspective. Software Quality Journal,2007,15:327-344.
9Pelayo L, Dick S. Applying novel resampling strategies to soft-ware defect prediction. In Proc. the 2007 Annual Meeting of the North American Fuzzy Information Processing Society, San Diego, USA, Jun.24-27,2007,pp.69-72.
10Zhou Z H, Li M. Semi-supervised learning by disagreement. Knowledge and Information Systems,2010,24(3):415-439.

同被引文献78

1赵亮,侯金宝.文件和包层次的软件缺陷预测[J].清华大学学报（自然科学版）,2011,51(S1):1472-1476. 被引量：2
2鲁宏伟,魏凯,孔华锋.一种改进的KMP高效模式匹配算法[J].华中科技大学学报（自然科学版）,2006,34(10):41-43. 被引量：26
3CATAL C. Software fault prediction: a literature review and current trends[ J]. Expert Systems with Applications: An International Journal,2011,38(4) :4626-4636.
4LIU Yi, KHOSHGOFTAAR T M, SELIYA N. Evolutionary optimiza- tion of software quality modeling with multiple repositories[ J]. IEEE Trans on Software Enclineering,2010,36(6) :852-864.
5MENZIES T, MILTON Z, TURHAN B, et al. Defect prediction from static code features : current results, limitations, new approaches [ J ]. Automated Software Engineering ,2010,17 (4) :375-407.
6AUBRAMANYAN R, KRISHNAN M S. Empirical analysis of CK metrics for object-oriented design complexity : implications for software defects[ J ]. IEEE Trans on Software Engineering,2003,29 (4) : 297-310.
7GYIMOTHY T, RERENC R, SIKET I. Empirical validation of ob- ject-oriented metrics on open source software for fault prediction [ J ]. IEEE Trans on Software Engineering ,2005,31 (10) :897-910.
8KHOSHGOFTAAR T M, SELIYA N. The necessity of assuring quali- ty in software measurement data[ C ]// Proc of the 10th International Conference on Software Metrics. 2004 : 119-130.
9MENZIES T, GREENWALD J, FRANK A. Data mining static code attributes to learn defect predictors [ J ]. IEEE Trans on Software Engineering ,2007,32( 11 ) :2-13.
10SHEPPED M, INCE D. A critique of three metrics[ J]. Journal of Systems and Software, 1994,26 ( 3 ) : 197- 210.

引证文献10

1马樱,罗光春,李炯,陈爱国.偏相关方法在软件缺陷预测中的应用[J].计算机应用研究,2012,29(2):594-596. 被引量：3
2李倩茹,姚伟.基于均衡有偏支持向量机的软件缺陷预测[J].计算机工程,2013,39(8):87-91. 被引量：1
3常瑞花,贾鹏.基于度量元的静态软件缺陷预测技术[J].火力与指挥控制,2015,40(2):1-5. 被引量：2
4马振宇,张威,毕学军,金丽亚.基于优化PSO-BP算法的软件缺陷预测模型[J].计算机工程与设计,2016,37(2):413-417. 被引量：7
5杨泽辉,李琳,乔冰琴.基于改进蚁群算法在软件缺陷预测中的应用[J].太原科技大学学报,2016,37(1):17-21.
6张志武,荆晓远,吴飞.基于非负稀疏图的协同训练软件缺陷预测[J].计算机技术与发展,2017,27(7):38-42. 被引量：2
7边伟成.基于AOP的软件缺陷监测框架的设计与实现[J].电子设计工程,2017,25(16):27-31. 被引量：5
8张晓峰.软件缺陷预测研究综述[J].信息通信,2020(4):125-127.
9饶珍丹,李英梅,董昊,张彤.多层次过采样集成的不平衡数据缺陷预测模型[J].小型微型计算机系统,2023,44(4):888-896. 被引量：2
10杨宏宇,马建辉,侯旻,沈双宏,陈恩红.基于多模态对比学习的代码表征增强预训练方法[J].软件学报,2024,35(4):1601-1617.

二级引证文献22

1田亮,张任.平移校正下印刷图像缺陷检测优化仿真[J].计算机仿真,2019,36(1):409-412. 被引量：2
2常瑞花,贾鹏.基于度量元的静态软件缺陷预测技术[J].火力与指挥控制,2015,40(2):1-5. 被引量：2
3杨泽辉,李琳,乔冰琴.基于改进蚁群算法在软件缺陷预测中的应用[J].太原科技大学学报,2016,37(1):17-21.
4周喜平,何保锋.静态软件中故障数据预测建模研究仿真[J].计算机仿真,2016,33(5):443-446. 被引量：3
5郝学良,朱小冬,叶飞.基于配置管理的软件维护性评估[J].火力与指挥控制,2016,41(10):142-145.
6于小兵,卢逸群,吉中会,骆翔,蔡玫.近45 a来我国农业气象灾害变化特征及其对粮食产量的影响[J].长江流域资源与环境,2017,26(10):1700-1710. 被引量：25
7胡柳,邓杰,赵正伟,李瑞.一种基于关联规则的网络软件缺陷预测方法[J].信息技术与网络安全,2018,37(4):41-44. 被引量：1
8马振宇,张威,王晓岭,高飞,刘福胜.基于改进PSO与NB算法的软件缺陷预测模型[J].中国电子科学研究院学报,2018,13(4):460-464. 被引量：2
9李顺勇,高艳.一种基于三次曲线关系变量间的偏相关分析[J].云南民族大学学报（自然科学版）,2018,27(2):113-118. 被引量：3
10刘欣,张楠,孙辰军.电压频控中抗强干扰软件关联缺陷检测[J].计算机测量与控制,2018,26(4):11-14.

1薛文革,周俊林,曹有江.模型库管理中的模型收集与处理[J].计算机应用研究,1995,12(3):27-29. 被引量：2
2苏志勇,周颖,李必信.面向用户的Web服务可靠性计算模型[J].东南大学学报（自然科学版）,2008,38(4):605-610. 被引量：4
3杨秋强,王焰.液晶显示模块类型的软件识别[J].现代显示,2000(3):50-55.
4模型收集控的完美世界[J].电脑迷,2011(14):94-94.
5徐传运,杨丹,张杨.缺陷驱动的Web服务可信性度量模型[J].计算机应用研究,2011,28(7):2723-2725. 被引量：3
6张晓蓉,张红.浅析BGA芯片与返修工艺[J].现代企业教育,2012,0(5S):261-262.
7郑来波,袁东风,胡健栋.OFDM系统中基于导符号和判决数据的频偏估计[J].北京邮电大学学报,2005,28(5):21-24.
8陈卓.浅析移动4G定向流量在远程教育中的应用[J].南京广播电视大学学报,2015(4):79-81. 被引量：1
9精彩片段采集器[J].国外科技动态,2004(9):33-33.
10张毅.红外遥控编码的软件识别接收法[J].重庆邮电学院学报（自然科学版）,2001,13(B06):84-86. 被引量：5

Journal of Computer Science & Technology

2011年第2期

浏览历史

内容加载中请稍等...

Software Defect Detection with ROCUS 被引量：10

参考文献58

同被引文献78

引证文献10

二级引证文献22

相关作者

相关机构

相关主题

浏览历史