摘要
在软件缺陷预测技术应用中,需要预测的项目可能是一个全新的项目,或者需要预测的项目历史数据较为不足。一种解决方法是利用已有数据充足的项目(源项目)构建模型完成对新项目(目标项目)的预测,主要利用传统机器学习方法对源项目与目标项目进行特征迁移学习完成缺陷预测,但不同项目之间的数据存在较大的分布差异,同时传统机器方法学习到的特征表示能力很弱且缺陷预测性能较差。针对此问题,从深度学习出发提出一种基于堆叠降噪自编码器的跨项目缺陷预测方法,该方法结合堆叠降噪自编码器和最大均值差异距离,能够有效地提取源项目与目标项目可迁移的深层次特征表示,基于该特征可以训练出有效的缺陷数量预测模型。实验结果表明,在Relink数据集和AEEEM数据集上与经典的跨项目缺陷预测方法Burak过滤法、Peters过滤法、TCA以及TCA+进行比较,该方法在大多数情况下可取得最好的预测结果。
In the application of software defect prediction technology,the project to be predicted may be a brand new project,or the historical data of the project to be predicted is insufficient.One solution is to use a project(source project)with sufficient data to build a model to complete the prediction of a new project(target project),and mainly use traditional machine learning methods to perform feature transfer learning on the source project and the target project to complete defect prediction.There is a large difference in the distribution of data between different projects,and the feature representation ability learned by traditional machine methods is weak and the defect prediction performance is poor.In response to this problem,a cross-item defect prediction method based on stacked denoising autoencoders is proposed from the perspective of deep learning.This method combines stacked denoising autoencoders and maximum mean difference distance,which can effectively extract the transferable deeplevel feature representation of source items and target items,based on which an effective defect number prediction model can be trained.The experimental results show that compared with the classical cross-item defect prediction methods Burak filtering method,Peters filtering method,TCA and TCA+on Relink dataset and AEEEM dataset,this method achieves the best prediction results in most cases.
作者
刘路瑶
韩培胜
LIU Lu-yao;HAN Pei-sheng(School of Cryptography,University of Information Engineering,Zhengzhou 450000,China)
出处
《计算机与现代化》
2023年第4期32-38,46,共8页
Computer and Modernization
基金
国家自然科学基金资助项目(61572517)。
关键词
跨项目软件缺陷预测
堆叠降噪自编码器
最大均值差异距离
深度特征表示
cross-project software defect prediction
stacked denoising autoencoders
maximum mean difference distance
deep feature representation