摘要
软件质量是计算机系统安全可靠运行的保障,而软件缺陷是导致软件质量低下的重要诱因。软件缺陷挖掘技术凭借其能够通过对软件代码及其相关数据进行分析建模,发现软件系统潜在的缺陷,已得到了软件质量保障领域的广泛关注。要准确发现软件模块中潜在的缺陷,需要利用大量带有缺陷情况标注的模块进行学习。然而,缺陷情况标注往往需要通过详细测试或人工代码检查获取,要消耗大量测试和人工资源,在实际应用中难以满足,这严重制约了软件缺陷挖掘的性能。针对这一问题,半监督学习技术被引入软件缺陷挖掘,通过对大量缺少标注的模块进行利用,辅助提升软件缺陷挖掘的性能。本文对半监督缺陷挖掘技术的研究现状进行综述。首先综述了软件缺陷挖掘研究现状,然后简要介绍了半监督学习的4种学习范式;最后系统梳理了基于半监督学习进行软件缺陷挖掘的多种方法与技术。
Software quality ensures the reliable running of the software system, and software defects reduce the quality of the software system. Software defects can be identified effectively by mining the codes as well as other related data, so the software defect mining technology has drawn significant attention in software quality assurance. To effectively identify potential software defects from the software modules, a large number of modules labeled as defective or non-defective information need to be collected for model construction. However, the labels of modules are usually obtained by extensive testing or manual code inspection, which consumes a huge amount of manpower and time. In practice, only a small number of labels can be collected, which seriously constrains the performance of defect identification. To solve this problem, the semi-supervised learning is introduced into software defect mining, thus the mining performance is improved by exploiting the large number of unlabeled modules. Here, the advances and the research status of semi-supervised software defect mining are reviewed and discussed extensively. Firstly, the existing studies on software defect mining is briefly review, and then the four major paradigms of semi-supervised learning are introduced. Finally, various methods and techniques on semi-supervised defeet mining are systematically summarized and reviewed.
出处
《数据采集与处理》
CSCD
北大核心
2016年第1期56-64,共9页
Journal of Data Acquisition and Processing
基金
国家自然科学基金(61422304
61272217)资助项目
江苏省自然科学基金(BK20131278)资助项目
关键词
软件挖掘
机器学习
半监督学习
软件缺陷挖掘
software mining
machine learning
semi-supervised learning
software defect mining