摘要
目前在软件代码缺陷审查以及缺陷预测中,研究人员对源代码进行分析研究却忽略了代码的缺陷信息.本文通过对缺陷信息进行分析,发现缺陷信息对于相似缺陷的检测有着重要的参考价值.基于这一思想,本文分析软件缺陷社区Stack Overflow中关于缺陷代码的信息,提出一种基于缺陷代码特征分析的相似缺陷检测方法.该方法首先对缺陷报告进行LDA主题分析并将缺陷报告分类到不同的主题(类别)中,统计得到高频缺陷类别;其次对于高频缺陷类别的缺陷代码提取特征;最后根据缺陷代码特征构建相似缺陷检测模型.为了验证相似缺陷检测模型的有效性,针对数据操作缺陷数据构建诊断模型并对该模型进行实证,实验结果表明该方法对检测其他代码中相似缺陷有较好的效果.
At present,in the software code defect review and defect prediction,the researchers analyze the source code but ignore the code defect information.This paper analyzes the defect information and finds that the defect information has important reference value for the detection of similar defects.Based on this idea,this paper analyzes the information about the defect code in the software defect community Stack Overflow,and proposes a similar defect detection method based on the defect code feature analysis.First,this method analyzes the topic of the defect report by LDA model and classifies the defect report into different topics(categories) to statistically obtain the high-frequency defect categories;second,extracts features for the defect codes of the high-frequency defect categories;and finally constructs similarities based on the characteristics of the defect codes.Defect detection model.In order to verify the validity of the similar defect detection model,a diagnostic model is constructed based on the data defect data and the model is verified.The experimental results show that the method has a good effect on detecting similar defects in other codes.
作者
亢振兴
赵逢禹
刘亚
KANG Zhen-xing;ZHAO Feng-yu;LIU Ya(School of Optical-Electrical and Computer Engineering,University of Shanghai for Science and Technology,Shanghai 200093,China)
出处
《小型微型计算机系统》
CSCD
北大核心
2021年第3期661-665,共5页
Journal of Chinese Computer Systems
基金
国家自然科学基金项目(61803264)资助。