半监督软件缺陷挖掘研究综述被引量：6

Software Defect Mining Based on Semi-supervised Learning

下载PDF

导出

摘要软件质量是计算机系统安全可靠运行的保障,而软件缺陷是导致软件质量低下的重要诱因。软件缺陷挖掘技术凭借其能够通过对软件代码及其相关数据进行分析建模,发现软件系统潜在的缺陷,已得到了软件质量保障领域的广泛关注。要准确发现软件模块中潜在的缺陷,需要利用大量带有缺陷情况标注的模块进行学习。然而,缺陷情况标注往往需要通过详细测试或人工代码检查获取,要消耗大量测试和人工资源,在实际应用中难以满足,这严重制约了软件缺陷挖掘的性能。针对这一问题,半监督学习技术被引入软件缺陷挖掘,通过对大量缺少标注的模块进行利用,辅助提升软件缺陷挖掘的性能。本文对半监督缺陷挖掘技术的研究现状进行综述。首先综述了软件缺陷挖掘研究现状,然后简要介绍了半监督学习的4种学习范式;最后系统梳理了基于半监督学习进行软件缺陷挖掘的多种方法与技术。 Software quality ensures the reliable running of the software system, and software defects reduce the quality of the software system. Software defects can be identified effectively by mining the codes as well as other related data, so the software defect mining technology has drawn significant attention in software quality assurance. To effectively identify potential software defects from the software modules, a large number of modules labeled as defective or non-defective information need to be collected for model construction. However, the labels of modules are usually obtained by extensive testing or manual code inspection, which consumes a huge amount of manpower and time. In practice, only a small number of labels can be collected, which seriously constrains the performance of defect identification. To solve this problem, the semi-supervised learning is introduced into software defect mining, thus the mining performance is improved by exploiting the large number of unlabeled modules. Here, the advances and the research status of semi-supervised software defect mining are reviewed and discussed extensively. Firstly, the existing studies on software defect mining is briefly review, and then the four major paradigms of semi-supervised learning are introduced. Finally, various methods and techniques on semi-supervised defeet mining are systematically summarized and reviewed.

作者黎铭霍轩

机构地区南京大学计算机软件新技术国家重点实验室

出处《数据采集与处理》 CSCD 北大核心 2016年第1期56-64,共9页 Journal of Data Acquisition and Processing

基金国家自然科学基金(61422304 61272217)资助项目江苏省自然科学基金(BK20131278)资助项目

关键词软件挖掘机器学习半监督学习软件缺陷挖掘 software mining machine learning semi-supervised learning software defect mining

分类号 TP181 [自动化与计算机技术—控制理论与控制工程] TP311.5 [自动化与计算机技术—计算机软件与理论]

引文网络
相关文献

参考文献59

1Sommerville I. Software engineering[M]. 9th Ed. Boston, MA, USA: Addison Wesley, 2010.
2Huizinga D, Kolawa A. Automated defect prevention Best practices in software management[M]. New York, USA[John Wiley Sons, 2007.
3Li Z, Zhou Y Y. PR-Miner: Automatically extracting implicit programming rules and detecting violations in large software code[C]//Proceedings of the 10th European Engineering Conference held jointly with 13th ACM SIGSOFT International Symposium on Foundations of Software Engineering. Lisbon, Portugal: ACM, 2005:306 315.
4Livshits B, Zimmermann T. DynaMine: Finding common error patterns by mining software revision histories[C]//Proceed- ings of the 10th European Engineering Conference held jointly with 13th ACM SIGSOFT International Symposium on Foun- dations of Software Engineering. Lisbon, Portugal: ACM, 2005:296-305.
5Tao X, Acharya M, Thummalapenta S, et al. Improving software reliability and productivity via mining program source code [C]//Proceedings ot the 22nd IEEE International Symposium on Parallel and Distributed Processing. Miami, USA: IEEE, 2008:1-5.
6Engler D, Chen D Y, Hallem S, et al. Bugs as inconsistent behavior: A general approach to inferring errors in systems code [C]//Proceedings of the 18th ACM Symposium on Operating System Principles. Banff, Alberta, Canada: Is. n. ], 2001,35 (3) :57-72.
7Li Z, Lu S, Myagmar S, et al. CP-Miner: A tool for finding copy-paste and related bugs in operating system code[C]//Pro- ceedings of the llth USENIX Symposium on Operating Systems Design and Implementation. San Francisco, USA: USEN1X Association, 2004 : 289-302.
8Fenton N, Bieman J. Software metrics A rigorous and practical approachEM]. Boca, Raton, USA[CRC Press, 2014.
9Menzies T, Greenwald J, Frank A. Data mining static code attributes to learn defect predictors [J]. IEEE Transactions on Software Engineering, 2007,33 ( 1 ) 2-13.
10Zimmermann T, Nagappan N. Predicting defects using network analysis on dependency graphs[C]//Proceedings of the 30th International Conference on Software Engineering. Leipzig, Germany: ACM, 2008:531 540.

二级参考文献56

1Chapelle O,Scholkopf B,Zien A. Semi-Supervised Learning[M].Cambridge,ma:the Mit Press,2006.
2Zhu X J. Semi-supervised Learning Literature Survey.Technical Report 1530[R].Department of Computer Sciences,University of Wisconsin at Madison,Madison,WI,2006.
3Zhou Z H,Li M. Semi-supervised learning by disagreement[J].Knowledge and Information Systems,2010,(03):415-439.
4Shahshahani B M,Landgrebe D A. The effect of unlabeled samples in reducing the small sample size problem and mitigating the Hughes phenomenon[J].IEEE Transactions on Geoscience and Remote Sensing,1994,(05):1087-1095.
5Miller D,Uyar H. A mixture of experts classifier with learning based on both labelled and unlabelled data[A].Cambridge,ma:the Mit Press,1997.571-577.
6Nigam K,McCallum A K,Thrun S,Mitchell T. Text classification from labeled and unlabeled documents using EM[J].Machine Learning,2000,(2-3):103-134.
7Blum A,Mitchell T. Combining labeled and unlabeled data with co-training[A].New York,USA:ACM,1998.92-100.
8Joachims T. Transductive inference for text classification using support vector machines[A].San Francisco,CA,USA,Morgan Kaufmann Publishers Inc,1999.200-209.
9Zhu X J,Ghahramani Z,Lafferty J. Semi-supervised learning using Gaussian fields and harmonic functions[A].Menlo Park,ca:aaai Press,2003.912-919.
10Zhou Z H. Semi-supervised learning by disagreement[A].Piscataway,NJ:IEEE,2008.93.

共引文献86

1胡云青,邱清盈,余秀,武建伟.基于改进三体训练法的半监督专利文本分类方法[J].浙江大学学报（工学版）,2020,54(2):331-339. 被引量：9
2麻瓯勃,刘雪娇,唐旭栋,周宇轩,胡亦承.基于半监督学习的恶意URL检测方法[J].计算机系统应用,2020(11):11-20. 被引量：4
3刘栋,张彩环.情境特征及其在情感分类模型中的应用[J].计算机应用研究,2020,37(1):144-147.
4赵建华.一种安全的基于分歧的半监督分类算法[J].西华大学学报（自然科学版）,2014,33(5):1-6. 被引量：2
5赵建华.基于SOM神经网络的半监督分类算法[J].西华大学学报（自然科学版）,2015,34(1):36-40. 被引量：7
6张国平,王宇东,马丽,黎远松.改进相关反馈技术在CBVR人体动作识别中的应用研究[J].激光杂志,2015,36(2):51-55.
7田淞,宋建社,张雄美,任伟龙.KM-SVM法的SAR图像无监督变化检测[J].系统工程与电子技术,2015,37(5):1042-1046. 被引量：7
8修宇,王骏,王忠群,刘三民.基于多图的交替优化图直推方法[J].计算机应用,2015,35(6):1611-1616.
9赵建华,刘宁.结合主动学习策略的半监督分类算法[J].计算机应用研究,2015,32(8):2295-2298. 被引量：7
10古平,吴庭君,文静云.基于概念与词根双特征互助文本分类模型[J].计算机与现代化,2015(8):93-97.

同被引文献19

1涂威威,黎铭,周志华.软件缺陷因素挖掘[J].吉林大学学报（工学版）,2012,42(S1):382-386. 被引量：1
2王青,伍书剑,李明树.软件缺陷预测技术[J].软件学报,2008,19(7):1565-1580. 被引量：149
3梅宏,王千祥,张路,王戟.软件分析技术进展[J].计算机学报,2009,32(9):1697-1710. 被引量：101
4于秀梅,梁彬,陈红,谢素斌,王眉林.路径敏感的源码关联变量模式挖掘及缺陷检测[J].模式识别与人工智能,2012,25(4):691-698. 被引量：1
5丁晖,陈林,钱巨,许蕾,徐宝文.一种基于信息量的缺陷定位方法[J].软件学报,2013,24(7):1484-1494. 被引量：16
6张荷,李梅,张阳,蔡晓妍.基于PU学习的软件故障检测研究[J].计算机应用研究,2015,32(11):3324-3327. 被引量：1
7张飞.改进PSO-ISVM算法的软件缺陷预测[J].计算机工程与应用,2016,52(11):17-21. 被引量：2
8张蕾,朱义鑫,徐春,于凯.基于字典学习的软件缺陷检测算法[J].计算机应用,2016,36(9):2486-2491. 被引量：2
9杨霁琳,张贤勇,唐孝.基于三支决策的模糊信息系统OWA算子参数选择[J].数据采集与处理,2016,31(6):1156-1163. 被引量：6
10王海林,于倩,李彤,郁湧,明利,孙金文.基于CS-ANN的软件缺陷预测模型研究[J].计算机应用研究,2017,34(2):467-472. 被引量：6

引证文献6

1李伟湋,郭鸿昌.基于邻域三支决策粗糙集模型的软件缺陷预测方法[J].数据采集与处理,2017,32(1):166-174. 被引量：5
2崔展齐,牟永敏,张志华,王伟光.基于函数调用序列模式挖掘的程序缺陷检测[J].计算机科学,2017,44(11):226-231. 被引量：2
3解铮,黎铭.基于代价敏感间隔分布优化的软件缺陷定位[J].软件学报,2017,28(11):3072-3079. 被引量：6
4包振栋,张阳,刘斌.PU场景下基于迁移学习的软件缺陷预测[J].计算机工程与设计,2018,39(3):663-667. 被引量：1
5崔展齐.基于函数调用序列模式和函数调用图的程序缺陷检测方法[J].湘潭大学自然科学学报,2018,40(2):71-75. 被引量：1
6国婷.差异化结构下排版软件运行缺陷优化预测仿真[J].计算机仿真,2020,37(5):341-344. 被引量：3

二级引证文献18

1胡柳,邓杰,赵正伟,李瑞.一种基于关联规则的网络软件缺陷预测方法[J].信息技术与网络安全,2018,37(4):41-44. 被引量：1
2崔展齐.基于函数调用序列模式和函数调用图的程序缺陷检测方法[J].湘潭大学自然科学学报,2018,40(2):71-75. 被引量：1
3吴雨茜,王俊丽,杨丽,余淼淼.代价敏感深度学习方法研究综述[J].计算机科学,2019,46(5):1-12. 被引量：20
4彭莉莎,钱文彬,王映龙,舒文豪.面向不完备邻域系统的三支决策粒计算方法[J].计算机工程与应用,2019,55(18):53-60. 被引量：1
5国婷.差异化结构下排版软件运行缺陷优化预测仿真[J].计算机仿真,2020,37(5):341-344. 被引量：3
6段卫华,杨春花.一种语句分裂变更模式的分类框架[J].智能计算机与应用,2020,10(3):132-137.
7李元诚,王伯彦,张攀,来风刚,黄秋岑.基于二次传播的开源软件缺陷定位方法[J].计算机应用研究,2020,37(7):2093-2096. 被引量：2
8谢玉华.期刊校对中几种关系的处理[J].内江师范学院学报,2020,35(10):120-124. 被引量：1
9李政亮,陈翔,蒋智威,顾庆.基于信息检索的软件缺陷定位方法综述[J].软件学报,2021,32(2):247-276. 被引量：10
10徐思婕.一种基于关联规则的网络软件缺陷预测方法[J].电子技术与软件工程,2020(24):48-49.

1悉心呵护键鼠开路[J].网友世界,2006(2):18-19.
2王翠娥,崔冬华,李香林.通信网络中非规则控制故障数据软件挖掘仿真[J].计算机仿真,2015,32(7):394-397. 被引量：3
3杨子旭,黎铭.二次回归学习及其在软件开发工作量预测上的应用[J].模式识别与人工智能,2015,28(1):59-64. 被引量：2
4孙胜兵.代码检查与缓冲区溢出浅谈[J].科技信息,2009(15):54-54.
5叶敏,刘丽达.软件静态代码检查工具引入及效益分析[J].信息技术与标准化,2013(9):60-63. 被引量：1
6韩乐,黎铭.基于代价敏感多标记学习的开源软件分类[J].软件学报,2014,25(9):1982-1991. 被引量：2
7张夷捷.AMD显卡最佳搭档用RadeonPro软件挖掘显卡潜力[J].微型计算机,2013(10):126-129.
8赵东晓.无监督缺陷模块序列预测模型:一个工作量感知评价[J].软件导刊,2013,12(9):35-37.
9张洁琳.Measurement & Automation Explorer软件应用[J].计量技术,2004(9):36-38.
10熊柏祥.代码审查之探讨[J].科教导刊（电子版）,2013(5):136-137.

数据采集与处理

2016年第1期

浏览历史

内容加载中请稍等...

半监督软件缺陷挖掘研究综述被引量：6

参考文献59

二级参考文献56

共引文献86

同被引文献19

引证文献6

二级引证文献18

相关作者

相关机构

相关主题

浏览历史

半监督软件缺陷挖掘研究综述 被引量：6

参考文献59

二级参考文献56

共引文献86

同被引文献19

引证文献6

二级引证文献18

相关作者

相关机构

相关主题

浏览历史

半监督软件缺陷挖掘研究综述被引量：6