期刊文献+

一种采用对抗学习的跨项目缺陷预测方法 被引量:5

Cross-project Defect Prediction Method Using Adversarial Learning
下载PDF
导出
摘要 跨项目缺陷预测(cross-project defect prediction, CPDP)已经成为软件工程数据挖掘领域的一个重要研究方向,它利用其他项目的缺陷代码来建立预测模型,解决了模型构建过程中的数据不足问题.然而源项目和目标项目的代码文件之间存在着数据分布的差异,导致跨项目预测效果不佳.基于生成式对抗网络(generative adversarial network,GAN)中的对抗学习思想,在鉴别器的作用下,通过改变目标项目特征的分布,使其接近于源项目特征的分布,从而提升跨项目缺陷预测的性能.具体来说,提出的抽象连续生成式对抗网络(abstract continuous generative adversarial network, AC-GAN)方法包括数据处理和模型构建两个阶段:(1)首先将源项目和目标项目的代码转换为抽象语法树(abstract syntax tree,AST)的形式,然后以深度优先方式遍历抽象语法树得出节点序列,再使用连续词袋模型(continuous bag-of-words model,CBOW)生成词向量,依据词向量表将节点序列转化为数值向量;(2)处理后的数值向量被送入基于GAN网络结构的模型进行特征提取和数据迁移,然后使用二分类器来判断目标项目代码文件是否有缺陷. AC-GAN方法在15组源-目标项目对上进行了对比实验,实验结果表明了该方法的有效性. Cross-project defect prediction(CPDP) has become an important research direction in data mining of software engineering,which uses the defective codes of other projects to build prediction models and solves the problem of insufficient data in the process of model construction. Nevertheless, there is difference in data distribution between the code files of source and target projects, which leads to poor cross-project prediction results. Based on the adversarial learning idea of generative adversarial network(GAN), under the action of discriminator, the distribution of target project features can be changed to make it close to the distribution of source project features, so as to improve the performance of cross-project defect prediction. Specifically, the process of the proposed abstract continuous GAN(AC-GAN) method consists of two stages: Data processing and model construction. First, the source and target project codes are converted into the form of abstract syntax trees(ASTs), and then the ASTs are traversed in a depth-first manner to derive the token sequences. The continuous bag-of-words model(CBOW) is used to generate word vectors, and the token sequences are transformed into numeric vectors based on the word vector table. Second, the processed numeric vectors are fed into a GAN structure-based model for feature extraction and data migration. Finally, a binary classifier is used to determine whether the target project code files are defective or not. The AC-GAN method conducted comparison experiments on 15 sets of source-target project pairs, and the experimental results demonstrate the effectiveness of this method.
作者 邢颖 钱晓萌 管宇 章世豪 赵梦赐 林婉婷 XING Ying;QIAN Xiao-Meng;GUAN Yu;ZHANG Shi-Hao;ZHAO Meng-Ci;LIN Wan-Ting(School of Artificial Intelligence,Beijing University of Posts and Telecommunications,Beijing 100876,China;School of Modern Post(School of Automation),Beijing University of Posts and Telecommunications,Beijing 100876,China)
出处 《软件学报》 EI CSCD 北大核心 2022年第6期2097-2112,共16页 Journal of Software
基金 国家自然科学基金(61702044) 国家重点研发计划课题(2017YFD0401001)。
关键词 跨项目缺陷预测 生成式对抗网络 连续词袋模型 抽象语法树 cross-project defect prediction generative adversarial network(GAN) bag-of-words model abstract syntax tree(AST)
  • 相关文献

参考文献5

二级参考文献27

共引文献109

同被引文献26

引证文献5

二级引证文献5

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部