摘要
跨项目软件缺陷预测技术可以利用现有的已标注缺陷数据集对新的无标记项目进行预测,但需要两者之间具有相同的度量集合,难以用于实际开发.异构缺陷预测技术可以在具有异构度量集合的项目间进行缺陷预测,该技术引起了大量研究人员的关注.现有的异构缺陷预测技术利用朴素的或者传统机器学习方法为源项目和目标项目学习特征表示,所学习到的特征表示能力很弱且缺陷预测性能很差.鉴于深度神经网络强大的特征抽取和表示能力,基于变分自编码器技术提出了一种面向异构缺陷预测的特征表示方法.该模型结合了变分自编码器和最大均值差异距离,能够有效地学习源项目和目标项目的共性特征表示,基于该特征表示可以训练出有效的缺陷预测模型.在多组缺陷数据集上通过与传统跨项目缺陷预测方法及异构缺陷预测方法实验对比验证了所提方法的有效性.
Cross-project defect prediction technology can use the existing labeled defect data to predict new unlabeled data,but it needs to have the same metric features for two projects,which is difficult to be applied in actual development.Heterogeneous defect prediction can perform prediction without requiring the source and target project to have the same set of metrics and thus has attracted great interest.Existing heterogeneous defect prediction models use naive or traditional machine learning methods to learn feature representations between source and target projects,and perform prediction based on it.The feature representation learned by previous studies is weak,causing poor performance in predicting defect-prone instances.In view of the powerful feature extraction and representation capabilities of deep neural networks,this study proposes a feature representation method for heterogeneous defect prediction based on variational autoencoders.By combining the variational autoencoder and maximum mean discrepancy,this method can effectively learn the common feature representation of the source and target projects.Then,an effective defect prediction model can be trained based on it.The validity of the proposed method is verified by comparing it with traditional cross-project defect prediction methods and heterogeneous defect prediction methods on various datasets.
作者
贾修一
张文舟
李伟湋
黄志球
JIA Xiu-Yi;ZHANG Wen-Zhou;LI Wei-Wei;HUANG Zhi-Qiu(School of Computer Science and Engineering,Nanjing University of Science and Technology,Nanjing 210094,China;College of Aerospace Engineering,Nanjing University of Aeronautics and Astronautics,Nanjing 210016,China;College of Computer Science and Technology,Nanjing University of Aeronautics and Astronautics,Nanjing 211106,China)
出处
《软件学报》
EI
CSCD
北大核心
2021年第7期2204-2218,共15页
Journal of Software
基金
国家自然科学基金(61906090,U20B2064,61773208)
江苏省自然科学基金(BK20191287,BK20170809)
中央高校基本科研业务费专项资金(30920021131)
中国博士后科学基金(2018M632304)。
关键词
异构缺陷预测
变分自编码器
特征表示
heterogeneous defect prediction
variational autoencoders
feature representation