基于PU学习的软件故障检测研究被引量：1

Research of software fault prediction based on PU learning

下载PDF

导出

摘要针对软件故障数据中正例样本相对较少且大量样本标注困难的现实场景,已知未标注样本中包含用于建立故障检测模型的大量有用信息,提出仅用正例和未标注数据构建分类模型对软件开发过程中的故障进行检测的半监督学习方法。首先采用合成少数类过采样SMOTE算法对数据集中的正例样本进行过采样,平衡数据集中的类分布。在此基础上合理构建正例集合和未标注集合,采用POSC 4.5和Bagging算法构建软件故障决策树集成分类器。通过对NASA MDP数据库中的12个数据集进行对比实验,结果表明,仅用正例和未标注数据建模可以得到与有监督学习方法相近的软件故障检测率,且集成分类器方法比单分类器方法具有更高的检测率,未标注样本集大小的软件故障检测率同样有影响。 The software fault datasets were highly possible that there were only a small set of labeled positive data and most of the data was hard to be labeled, which contained a great deal of useful information for building a prediction model for software fault detection. This paper proposed a semi-supervised classification model to predict the faults only using the positive and unlabeled data during the software development process, The proposed method firstly used the SMOTE （ synthetic minority oversampling technique） method to balance the class distribution by oversampling on the rare positive dataset. Then partitioned the improved dataset into positive subset and unlabeled subset properly. Third used the POSC 4.5 algorithm and Bagging algorittnn to build a decision tree classification ensemble model for software fault prediction using these subsets. The experiments were conducted on 12 datasets from the NASA MDP database. The experiment results show that the fault detection rate based on positive and unlabeled learning is close to the supervised learning method. The ensemble classifier method can effectively improve detective performance than a single classifier method, and the unlabeled level can effect the fault detection somehow.

作者张荷李梅张阳蔡晓妍

机构地区西北农林科技大学信息工程学院西北农林科技大学机电学院

出处《计算机应用研究》 CSCD 北大核心 2015年第11期3324-3327,3331,共5页 Application Research of Computers

基金国家自然科学基金资助项目(61303125)

关键词软件故障检测正例和未标注学习不平衡数据决策树集成分类器 software fault prediction PU learning unbalanced data decision tree ensemble classifier

分类号 TP311.5 [自动化与计算机技术—计算机软件与理论]

引文网络
相关文献

参考文献27

1Guo Lan, Ma Yan, Cukic B, et al. Robust prediction of fault-proneness by random forests E C ]//Proc of the 15 th International Symposium on Software Reliability Engineering. [ S. 1. ] : IEEE Press, 2004 : 417- 428.
2Gondra I. Appi[ying machine learning to software fault-proneness pre- diction[J]. Journal of Systems and Software,2008,81 (2) :186- 195.
3Li Ming, Zhang Hongyu, Wu Rongxin, et al. Sample-based software de- fect prediction with active and semi-supervised learning [ J ]. Automa- ted Software Engineering,2012,19(2) :201-230.
4Lu Huihua, Cnkic B, Culp M. An iterative semi-supervised approach to software fau]tt prediction[ C ]//Proc of the 7th International Confe- rence on Predictive Models in Software Engineering. [ S. 1. ] : ACM Press,2011.
5Letouzey F, Denis F, Gilleron R. Learning from positive and unlabeled examples[ C ]//Proc of the 1 l tth International Conference on Algo- rithmic Learning Theory. [ S. 1. ] :Springer,2000:71-85.
6Chaula N V, Bowyer K W, Hall L O,et al. SMOTE:synthetic minority over-sampling 1Leehnique[ J]. Journal of Artificial Intelligence Re- search,2002,16(3) :321-357.
7Breiman L. Bagging predictors [ J ]. Machine Learning, 1996,24 (2) :123-140.
8Nasa/Wvu IV&V facility, metrics data program, available from [ EB/ OL]. (2007). http://mdp, ivv. nasa. gov.
9Li Rengqing, Wang Shihai. An empirical study for software fault- proneness prediction with ensemble learning models on imbalanced data sets [ J ]. Journal of Software,2014,9 (3) :697-704.
10Catal C. Software fault prediction:a literature review and current trends [J]. Expert Systems with Applications,2011,38(4) :4626-4636.

二级参考文献136

1何丽莉,白洪涛.用聚类分析方法挖掘Aspect[J].计算机集成制造系统,2006,12(1):149-153. 被引量：6
2金龙飞,刘磊.一种基于形式概念分析的语句级自动化方面挖掘方法[J].小型微型计算机系统,2006,27(4):677-680. 被引量：3
3李轩,郝克刚,葛玮.面向对象软件度量的分析和研究[J].计算机技术与发展,2006,16(11):38-41. 被引量：7
4罗云锋,贲可荣.基于BBNs的软件故障预测方法[J].电子学报,2006,34(B12):2380-2383. 被引量：4
5毛勇,周晓波,夏铮,尹征,孙优贤.特征选择算法研究综述[J].模式识别与人工智能,2007,20(2):211-218. 被引量：95
6单锦辉,徐克俊.软件故障诊断探讨[J].北京化工大学学报（自然科学版）,2007,34(A01):5-8. 被引量：9
7Allen K,Krutz W,Olivier D.Software Reuse:Mining,Refining,and Designing,1990:222-226
8Bellon S,Koschke R,Antoniol G,et al.Comparison end Evaluation of Clone Detection Tools[J].IEEE Trans.on Soft.Eng.,2007,33(9):577-591
9Marcus A,Maletie J.Identification of High-Level Concept Clones in Source Code[C]//Proc.of ASE'01.2001:107-114
10Li Z,Lu S,Myagmar S,et al.CP-Miner:A Tool for Finding Copy-paste and Related Bugs in Operating System Code[M].Operating System Design Implementation,2004:289-302

共引文献82

1司倩然,闫国英.航天测控软件缺陷管理与分析系统设计[J].飞行器测控学报,2010,29(6):54-59. 被引量：4
2路莹,马立权.软件缺陷跟踪管理系统设计[J].大连轻工业学院学报,2005,24(3):215-217. 被引量：5
3袁东林.利用正交缺陷分类技术测量软件验证过程的有效性[J].计算机应用研究,2006,23(4):81-84. 被引量：3
4袁东林.基于正交缺陷分类的软件过程测量方法[J].计算机应用与软件,2007,24(3):65-68. 被引量：3
5连进,朱华.基于ODC的软件缺陷度量研究及应用[J].江汉大学学报（自然科学版）,2007,35(1):62-66.
6蔡立明,赵逢禹.软件间歇性缺陷处理的研究[J].计算机工程与设计,2007,28(8):1759-1761.
7马捷中,陆艳洪,郭阳明,翟正军.测控软件系统检测技术研究[J].计算机应用与软件,2008,25(2):119-120.
8连进,朱晓燕.软件缺陷管理系统的研究[J].江汉大学学报（自然科学版）,2008,36(2):54-57. 被引量：5
9尹相乐,马力,关昕.软件缺陷分类的研究[J].计算机工程与设计,2008,29(19):4910-4913. 被引量：22
10吕林涛,安存召,李翠.面向软件质量的Bug等级评价算法[J].计算机工程与设计,2008,29(23):6033-6036. 被引量：1

同被引文献1

1黎铭,霍轩.半监督软件缺陷挖掘研究综述[J].数据采集与处理,2016,31(1):56-64. 被引量：6

引证文献1

1包振栋,张阳,刘斌.PU场景下基于迁移学习的软件缺陷预测[J].计算机工程与设计,2018,39(3):663-667. 被引量：1

二级引证文献1

1尤姗姗,刘雪娇.基于非线性集成深度学习的软件模块风险预测[J].计算机仿真,2021,38(11):305-308.

1刘辉,李蕊,李浪,汪新文.面向软件设计阶段的故障检测研究:模型、算法和例证[J].科学技术与工程,2011,11(29):7145-7149.
2郑科鹏,冯筠,孙霞,冯宏伟,曹国震.基于静态集成PU学习数据流分类的入侵检测方法[J].西北大学学报（自然科学版）,2014,44(4):568-572. 被引量：1
3孙民权.基于通信消息的云计算系统故障传播分析[J].电脑知识与技术,2014,10(10X):7029-7031.
4刘辉,李蕊,焦铬.软件故障检测技术及其发展探讨[J].电脑知识与技术,2011,7(5X):3558-3560. 被引量：2
5叶俊民,张涛,董威,齐治昌.基于程序静态分析和故障树的软件故障检测[J].计算机工程,2008,34(16):75-76. 被引量：2
6宋群,张骏,智永锋.基于集成PU学习数据流分类的入侵检测方法[J].微电子学与计算机,2013,30(7):173-176.
7余长春.基于散列存储和数据加密的数据安全保护研究[J].信息安全与技术,2014,5(10):30-32.
8朱文灏,郭其一.微网故障检测关联规则挖掘算法研究[J].电气技术,2015,16(8):7-10.
9周双娥,熊国平.基于Petri网的故障检测模型的设计与分析[J].计算机研究与发展,2010,47(S1):118-121. 被引量：2
10张星,张阳,刘明建,王勇.DTU-PU:针对不确定数据PU学习的决策树[J].计算机工程与应用,2013,49(9):127-133. 被引量：2

计算机应用研究

2015年第11期

浏览历史

内容加载中请稍等...

基于PU学习的软件故障检测研究被引量：1

参考文献27

二级参考文献136

共引文献82

同被引文献1

引证文献1

二级引证文献1

相关作者

相关机构

相关主题

浏览历史

基于PU学习的软件故障检测研究 被引量：1

参考文献27

二级参考文献136

共引文献82

同被引文献1

引证文献1

二级引证文献1

相关作者

相关机构

相关主题

浏览历史

基于PU学习的软件故障检测研究被引量：1