摘要
缺陷定位是软件开发过程的重要环节。充分利用程序的结构特征和行为特征有助于提高缺陷定位效率。提出一种基于多变量Logistic回归分析的缺陷定位框架,用于软件演化时对新版本程序进行类方法级别的缺陷定位。首先设计一组度量结构特征和行为特征的指标,通过静态分析和测试程序搜集并构建旧版本程序的特征数据集,同时从缺陷跟踪系统获取旧版本缺陷信息;其次,基于所得特征数据集和缺陷信息,应用单变量分析筛选出度量指标中与缺陷显著相关的指标,随后用选中的显著指标展开多变量分析,训练多变量Logistic模型;最后,基于选出的显著指标搜集并构建新版本程序的特征数据集,运用得到的Logistic模型预测每个类方法的出错概率,进而按出错概率降序检查类方法以定位错误。基于一组开源程序进行缺陷定位实证研究,结果表明,多变量Logistic模型可以提高缺陷定位的效率。
Fault localization plays an important role in software development. Combining both con- struction features and behavior characteristics of program can benefit fault locating. A framework based on multivariate logistic regress model for locating fault in evolving software is proposed. Firstly, the feature data set is constructed by statically analyzing and tracing the program that runs with a set of de- signed metrics of program construction features and behavior characteristics. Meanwhile, the fault infor- mation of old version is obtained from the bug tracking system. Secondly, a univariate analysis is per- formed to select the metrics that are significantly related to fault, and then we train the multivariate Lo- gistic model on the selected metrics with the constructed feature data set and the tracked fault informa- tion. Finally, based on the trained Logistic model, we conduct the multivariate logistic analysis on the feature data set of a new version of evolved software, and predict the faulty class methods. We also conduct an empirical study on a set of benchmarks. The results indicate that the multivariate Logistic model can improve the effectiveness of fault localization.
出处
《计算机工程与科学》
CSCD
北大核心
2014年第10期1952-1960,共9页
Computer Engineering & Science
基金
国家自然科学基金资助项目(61202006
61340037)
中央高校基本科研业务费专项资金资助项目(2013QNB17)
江苏省高校自然科学研究资助项目(12KJB520014)
江苏省研究生培养创新工程资助项目(CXZZ12_0935)