期刊文献+
共找到14篇文章
< 1 >
每页显示 20 50 100
Cross-Project Software Defect Prediction Based on SMOTE and Deep Canonical Correlation Analysis
1
作者 Xin Fan Shuqing Zhang +2 位作者 Kaisheng Wu Wei Zheng Yu Ge 《Computers, Materials & Continua》 SCIE EI 2024年第2期1687-1711,共25页
Cross-Project Defect Prediction(CPDP)is a method that utilizes historical data from other source projects to train predictive models for defect prediction in the target project.However,existing CPDP methods only consi... Cross-Project Defect Prediction(CPDP)is a method that utilizes historical data from other source projects to train predictive models for defect prediction in the target project.However,existing CPDP methods only consider linear correlations between features(indicators)of the source and target projects.These models are not capable of evaluating non-linear correlations between features when they exist,for example,when there are differences in data distributions between the source and target projects.As a result,the performance of such CPDP models is compromised.In this paper,this paper proposes a novel CPDP method based on Synthetic Minority Oversampling Technique(SMOTE)and Deep Canonical Correlation Analysis(DCCA),referred to as S-DCCA.Canonical Correlation Analysis(CCA)is employed to address the issue of non-linear correlations between features of the source and target projects.S-DCCA extends CCA by incorporating the MlpNet model for feature extraction from the dataset.The redundant features are then eliminated by maximizing the correlated feature subset using the CCA loss function.Finally,cross-project defect prediction is achieved through the application of the SMOTE data sampling technique.Area Under Curve(AUC)and F1 scores(F1)are used as evaluation metrics.This paper conducted experiments on 27 projects from four public datasets to validate the proposed method.The results demonstrate that,on average,our method outperforms all baseline approaches by at least 1.2%in AUC and 5.5%in F1 score.This indicates that the proposed method exhibits favorable performance characteristics. 展开更多
关键词 cross-project defect prediction deep canonical correlation analysis feature similarity
下载PDF
Graph-Based Feature Learning for Cross-Project Software Defect Prediction
2
作者 Ahmed Abdu Zhengjun Zhai +2 位作者 Hakim A.Abdo Redhwan Algabri Sungon Lee 《Computers, Materials & Continua》 SCIE EI 2023年第10期161-180,共20页
Cross-project software defect prediction(CPDP)aims to enhance defect prediction in target projects with limited or no historical data by leveraging information from related source projects.The existing CPDP approaches... Cross-project software defect prediction(CPDP)aims to enhance defect prediction in target projects with limited or no historical data by leveraging information from related source projects.The existing CPDP approaches rely on static metrics or dynamic syntactic features,which have shown limited effectiveness in CPDP due to their inability to capture higher-level system properties,such as complex design patterns,relationships between multiple functions,and dependencies in different software projects,that are important for CPDP.This paper introduces a novel approach,a graph-based feature learning model for CPDP(GB-CPDP),that utilizes NetworkX to extract features and learn representations of program entities from control flow graphs(CFGs)and data dependency graphs(DDGs).These graphs capture the structural and data dependencies within the source code.The proposed approach employs Node2Vec to transform CFGs and DDGs into numerical vectors and leverages Long Short-Term Memory(LSTM)networks to learn predictive models.The process involves graph construction,feature learning through graph embedding and LSTM,and defect prediction.Experimental evaluation using nine open-source Java projects from the PROMISE dataset demonstrates that GB-CPDP outperforms state-of-the-art CPDP methods in terms of F1-measure and Area Under the Curve(AUC).The results showcase the effectiveness of GB-CPDP in improving the performance of cross-project defect prediction. 展开更多
关键词 cross-project defect prediction graphs features deep learning graph embedding
下载PDF
Within-Project and Cross-Project Software Defect Prediction Based on Improved Transfer Naive Bayes Algorithm 被引量:3
3
作者 Kun Zhu Nana Zhang +1 位作者 Shi Ying Xu Wang 《Computers, Materials & Continua》 SCIE EI 2020年第5期891-910,共20页
With the continuous expansion of software scale,software update and maintenance have become more and more important.However,frequent software code updates will make the software more likely to introduce new defects.So... With the continuous expansion of software scale,software update and maintenance have become more and more important.However,frequent software code updates will make the software more likely to introduce new defects.So how to predict the defects quickly and accurately on the software change has become an important problem for software developers.Current defect prediction methods often cannot reflect the feature information of the defect comprehensively,and the detection effect is not ideal enough.Therefore,we propose a novel defect prediction model named ITNB(Improved Transfer Naive Bayes)based on improved transfer Naive Bayesian algorithm in this paper,which mainly considers the following two aspects:(1)Considering that the edge data of the test set may affect the similarity calculation and final prediction result,we remove the edge data of the test set when calculating the data similarity between the training set and the test set;(2)Considering that each feature dimension has different effects on defect prediction,we construct the calculation formula of training data weight based on feature dimension weight and data gravity,and then calculate the prior probability and the conditional probability of training data from the weight information,so as to construct the weighted bayesian classifier for software defect prediction.To evaluate the performance of the ITNB model,we use six datasets from large open source projects,namely Bugzilla,Columba,Mozilla,JDT,Platform and PostgreSQL.We compare the ITNB model with the transfer Naive Bayesian(TNB)model.The experimental results show that our ITNB model can achieve better results than the TNB model in terms of accurary,precision and pd for within-project and cross-project defect prediction. 展开更多
关键词 cross-project defect prediction transfer Naive Bayesian algorithm edge data similarity calculation feature dimension weight
下载PDF
Unsupervised Domain Adaptation Based on Discriminative Subspace Learning for Cross-Project Defect Prediction 被引量:1
4
作者 Ying Sun Yanfei Sun +4 位作者 Jin Qi Fei Wu Xiao-Yuan Jing Yu Xue Zixin Shen 《Computers, Materials & Continua》 SCIE EI 2021年第9期3373-3389,共17页
:Cross-project defect prediction(CPDP)aims to predict the defects on target project by using a prediction model built on source projects.The main problem in CPDP is the huge distribution gap between the source project... :Cross-project defect prediction(CPDP)aims to predict the defects on target project by using a prediction model built on source projects.The main problem in CPDP is the huge distribution gap between the source project and the target project,which prevents the prediction model from performing well.Most existing methods overlook the class discrimination of the learned features.Seeking an effective transferable model from the source project to the target project for CPDP is challenging.In this paper,we propose an unsupervised domain adaptation based on the discriminative subspace learning(DSL)approach for CPDP.DSL treats the data from two projects as being from two domains and maps the data into a common feature space.It employs crossdomain alignment with discriminative information from different projects to reduce the distribution difference of the data between different projects and incorporates the class discriminative information.Specifically,DSL first utilizes subspace learning based domain adaptation to reduce the distribution gap of data between different projects.Then,it makes full use of the class label information of the source project and transfers the discrimination ability of the source project to the target project in the common space.Comprehensive experiments on five projects verify that DSL can build an effective prediction model and improve the performance over the related competing methods by at least 7.10%and 11.08%in terms of G-measure and AUC. 展开更多
关键词 cross-project defect prediction discriminative subspace learning unsupervised domain adaptation
下载PDF
Defect Prediction Using Akaike and Bayesian Information Criterion 被引量:2
5
作者 Saleh Albahli Ghulam Nabi Ahmad Hassan Yar 《Computer Systems Science & Engineering》 SCIE EI 2022年第6期1117-1127,共11页
Data available in software engineering for many applications contains variability and it is not possible to say which variable helps in the process of the prediction.Most of the work present in software defect predict... Data available in software engineering for many applications contains variability and it is not possible to say which variable helps in the process of the prediction.Most of the work present in software defect prediction is focused on the selection of best prediction techniques.For this purpose,deep learning and ensemble models have shown promising results.In contrast,there are very few researches that deals with cleaning the training data and selection of best parameter values from the data.Sometimes data available for training the models have high variability and this variability may cause a decrease in model accuracy.To deal with this problem we used the Akaike information criterion(AIC)and the Bayesian information criterion(BIC)for selection of the best variables to train the model.A simple ANN model with one input,one output and two hidden layers was used for the training instead of a very deep and complex model.AIC and BIC values are calculated and combination for minimum AIC and BIC values to be selected for the best model.At first,variables were narrowed down to a smaller number using correlation values.Then subsets for all the possible variable combinations were formed.In the end,an artificial neural network(ANN)model was trained for each subset and the best model was selected on the basis of the smallest AIC and BIC value.It was found that combination of only two variables’ns and entropy are best for software defect prediction as it gives minimum AIC and BIC values.While,nm and npt is the worst combination and gives maximum AIC and BIC values. 展开更多
关键词 Software defect prediction machine learning AIC BIC model selection cross-project defect prediction
下载PDF
A Cluster Based Feature Selection Method for Cross-Project Software Defect Prediction 被引量:7
6
作者 Chao Ni Wang-Shu Liu +3 位作者 Xiang Chen Qing Gu Dao-Xu Chen Qi-Guo Huang 《Journal of Computer Science & Technology》 SCIE EI CSCD 2017年第6期1090-1107,共18页
Cross-project defect prediction (CPDP) uses the labeled data from external source software projects to com- pensate the shortage of useful data in the target project, in order to build a meaningful classification mo... Cross-project defect prediction (CPDP) uses the labeled data from external source software projects to com- pensate the shortage of useful data in the target project, in order to build a meaningful classification model. However, the distribution gap between software features extracted from the source and the target projects may be too large to make the mixed data useful for training. In this paper, we propose a cluster-based novel method FeSCH (Feature Selection Using Clusters of Hybrid-Data) to alleviate the distribution differences by feature selection. FeSCH includes two phases. Tile feature clustering phase clusters features using a density-based clustering method, and the feature selection phase selects features from each cluster using a ranking strategy. For CPDP, we design three different heuristic ranking strategies in the second phase. To investigate the prediction performance of FeSCH, we design experiments based on real-world software projects, and study the effects of design options in FeSCH (such as ranking strategy, feature selection ratio, and classifiers). The experimental results prove the effectiveness of FeSCH. Firstly, compared with the state-of-the-art baseline methods, FeSCH achieves better performance and its performance is less affected by the classifiers used. Secondly, FeSCH enhances the performance by effectively selecting features across feature categories, and provides guidelines for selecting useful features for defect prediction. 展开更多
关键词 software defect prediction cross-project defect prediction feature selection feature clustering density-basedclustering
原文传递
A Novel Cross-Project Software Defect Prediction Algorithm Based on Transfer Learning 被引量:5
7
作者 Shiqi Tang Song Huang +3 位作者 Changyou Zheng Erhu Liu Cheng Zong Yixian Ding 《Tsinghua Science and Technology》 SCIE EI CAS CSCD 2022年第1期41-57,共17页
Software Defect Prediction(SDP) technology is an effective tool for improving software system quality that has attracted much attention in recent years.However,the prediction of cross-project data remains a challenge ... Software Defect Prediction(SDP) technology is an effective tool for improving software system quality that has attracted much attention in recent years.However,the prediction of cross-project data remains a challenge for the traditional SDP method due to the different distributions of the training and testing datasets.Another major difficulty is the class imbalance issue that must be addressed in Cross-Project Defect Prediction(CPDP).In this work,we propose a transfer-leaning algorithm(TSboostDF) that considers both knowledge transfer and class imbalance for CPDP.The experimental results demonstrate that the performance achieved by TSboostDF is better than those of existing CPDP methods. 展开更多
关键词 Software defect prediction(SDP) transfer learning imbalance class cross-project
原文传递
Combined classifier for cross-project defect prediction: an extended empirical study 被引量:2
8
作者 Yun ZHANG David LO +1 位作者 Xin XIA Jianling SUN 《Frontiers of Computer Science》 SCIE EI CSCD 2018年第2期280-296,共17页
To facilitate developers in effective allocation of their testing and debugging efforts, many software defect prediction techniques have been proposed in the literature. These techniques can be used to predict classes... To facilitate developers in effective allocation of their testing and debugging efforts, many software defect prediction techniques have been proposed in the literature. These techniques can be used to predict classes that are more likely to be buggy based on the past history of classes, methods, or certain other code elements. These techniques are effective provided that a sufficient amount of data is available to train a prediction model. However, sufficient training data are rarely available for new software projects. To resolve this problem, cross-project defect prediction, which transfers a prediction model trained using data from one project to another, was proposed and is regarded as a new challenge in the area of defect prediction. Thus far, only a few cross-project defect prediction techniques have been proposed. To advance the state of the art, in this study, we investigated seven composite algorithms that integrate multiple machine learning classifiers to improve cross-project defect prediction. To evaluate the performance of the composite algorithms, we performed experiments on 10 open-source software systems from the PROMISE repository, which contain a total of 5,305 instances labeled as defective or clean. We compared the composite algorithms with the combined defect predictor where logistic regression is used as the meta classification algorithm (CODEPLogistic), which is the most recent cross-project defect prediction algorithm in terms of two standard evaluation metrics: cost effectiveness and F-measure. Our experimental results show that several algorithms outperform CODEPLogistic:Maximum voting shows the best performance in terms of F-measure and its average F-measure is superior to that of CODEPLogistic by 36.88%. Bootstrap aggregation (Bagging J48) shows the best performance in terms of cost effectiveness and its average cost effectiveness is superior to that of CODEPLogistic by 15.34%. 展开更多
关键词 defect prediction cross-project classifier combination
原文传递
Effort-aware cross-project just-in-time defect prediction framework for mobile apps
9
作者 Tian CHENG Kunsong ZHAO +2 位作者 Song SUN Muhammad MATEEN Junhao WEN 《Frontiers of Computer Science》 SCIE EI CSCD 2022年第6期15-29,共15页
As the boom of mobile devices,Android mobile apps play an irreplaceable roles in people’s daily life,which have the characteristics of frequent updates involving in many code commits to meet new requirements.Just-in-... As the boom of mobile devices,Android mobile apps play an irreplaceable roles in people’s daily life,which have the characteristics of frequent updates involving in many code commits to meet new requirements.Just-in-Time(JIT)defect prediction aims to identify whether the commit instances will bring defects into the new release of apps and provides immediate feedback to developers,which is more suitable to mobile apps.As the within-app defect prediction needs sufficient historical data to label the commit instances,which is inadequate in practice,one alternative method is to use the cross-project model.In this work,we propose a novel method,called KAL,for cross-project JIT defect prediction task in the context of Android mobile apps.More specifically,KAL first transforms the commit instances into a high-dimensional feature space using kernel-based principal component analysis technique to obtain the representative features.Then,the adversarial learning technique is used to extract the common feature embedding for the model building.We conduct experiments on 14 Android mobile apps and employ four effort-aware indicators for performance evaluation.The results on 182 cross-project pairs demonstrate that our proposed KAL method obtains better performance than 20 comparative methods. 展开更多
关键词 kernel-based principal component analysis adversarial learning just-in-time defect prediction cross-project model
原文传递
Cross-project software defect prediction based on multi-source data sets
10
作者 Huang Junfu Wang Yawen +1 位作者 Gong Yunzhan Jin Dahai 《The Journal of China Universities of Posts and Telecommunications》 EI CSCD 2021年第4期75-87,共13页
Cross-project defect prediction(CPDP) uses one or more source projects to build a defect prediction model and applies the model to the target project. There is usually a big difference between the data distribution of... Cross-project defect prediction(CPDP) uses one or more source projects to build a defect prediction model and applies the model to the target project. There is usually a big difference between the data distribution of the source project and the target project, which makes it difficult to construct an effective defect prediction model. In order to alleviate the problem of negative migration between the source project and the target project in CPDP, this paper proposes an integrated transfer adaptive boosting(TrAdaBoost) algorithm based on multi-source data sets(MSITrA). The algorithm uses an existing two-stage data filtering algorithm to obtain source project data related to the target project from multiple source items, and then uses the integrated TrAdaBoost algorithm proposed in the paper to build a CPDP model. The experimental results of Promise’s 15 public data sets show that: 1) The cross-project software defect prediction model proposed in this paper has better performance in all tested CPDP methods;2) In the within-project software defect prediction(WPDP) experiment, the proposed CPDP method has achieved the better experimental results than the tested WPDP method. 展开更多
关键词 cross-project defect prediction multi-source transfer adaptive boosting ensemble learning
原文传递
结合特征对齐与实例迁移的跨项目缺陷预测
11
作者 李莉 赵鑫 +2 位作者 石可欣 苏仁嘉 任振康 《计算机应用研究》 CSCD 北大核心 2023年第10期3091-3099,共9页
为解决跨项目缺陷预测中源项目和目标项目分布差异较大的问题,提出了一种基于特征对齐和实例迁移的两阶段缺陷预测方法(FAIT)。首先,在特征对齐阶段,根据边缘概率分布进行特征的边缘分布对齐;然后,基于源项目和目标项目构建条件分布映... 为解决跨项目缺陷预测中源项目和目标项目分布差异较大的问题,提出了一种基于特征对齐和实例迁移的两阶段缺陷预测方法(FAIT)。首先,在特征对齐阶段,根据边缘概率分布进行特征的边缘分布对齐;然后,基于源项目和目标项目构建条件分布映射矩阵完成条件分布对齐;最后,在实例迁移阶段,通过改进了权重调整策略的TrAdaBoost方法构建跨项目缺陷预测模型。以F 1作为评价指标,当目标项目有标签实例比例为20%时,FAIT性能最佳,且两过程特征对齐优于单一过程特征对齐。此外,FAIT的预测性能在AEEEM和NASA数据集上分别提高了10.69%、15.04%。FAIT在一定程度上解决了源项目与目标项目的分布差异,能够取得较好的缺陷预测性能。 展开更多
关键词 跨项目缺陷预测 特征对齐 最大均值差异 实例迁移 TrAdaBoost
下载PDF
Cross Project Defect Prediction via Balanced Distribution Adaptation Based Transfer Learning 被引量:5
12
作者 Zhou Xu Shuai Pang +5 位作者 Tao Zhang Xia-Pu Luo Jin Liu Yu-Tian Tang Xiao Yu Lei Xue 《Journal of Computer Science & Technology》 SCIE EI CSCD 2019年第5期1039-1062,共24页
Defect prediction assists the rational allocation of testing resources by detecting the potentially defective software modules before releasing products. When a project has no historical labeled defect data, cross pro... Defect prediction assists the rational allocation of testing resources by detecting the potentially defective software modules before releasing products. When a project has no historical labeled defect data, cross project defect prediction (CPDP) is an alternative technique for this scenario. CPDP utilizes labeled defect data of an external project to construct a classification model to predict the module labels of the current project. Transfer learning based CPDP methods are the current mainstream. In general, such methods aim to minimize the distribution differences between the data of the two projects. However, previous methods mainly focus on the marginal distribution difference but ignore the conditional distribution difference, which will lead to unsatisfactory performance. In this work, we use a novel balanced distribution adaptation (BDA) based transfer learning method to narrow this gap. BDA simultaneously considers the two kinds of distribution differences and adaptively assigns different weights to them. To evaluate the effectiveness of BDA for CPDP performance, we conduct experiments on 18 projects from four datasets using six indicators (i.e., F-measure, g-means, Balance, AUC, EARecall, and EAF-measure). Compared with 12 baseline methods, BDA achieves average improvements of 23.8%, 12.5%, 11.5%, 4.7%, 34.2%, and 33.7% in terms of the six indicators respectively over four datasets. 展开更多
关键词 cross-project defect prediction transfer learning balancing DISTRIBUTION effort-aware INDICATOR
原文传递
基于实例过滤与迁移的跨项目缺陷预测方法 被引量:1
13
作者 范贵生 刁旭炀 +1 位作者 虞慧群 陈丽琼 《计算机工程》 CAS CSCD 北大核心 2020年第8期197-202,209,共7页
在跨项目软件缺陷预测中,人工采集标注的原始数据集通常包含噪声数据,并且源项目与目标项目之间的数据存在较大的分布差异性。针对该问题,提出一种两阶段跨项目缺陷预测方法CLNI-KMM。在实例过滤阶段,基于CLNI算法过滤噪声实例。在实例... 在跨项目软件缺陷预测中,人工采集标注的原始数据集通常包含噪声数据,并且源项目与目标项目之间的数据存在较大的分布差异性。针对该问题,提出一种两阶段跨项目缺陷预测方法CLNI-KMM。在实例过滤阶段,基于CLNI算法过滤噪声实例。在实例迁移阶段,采用KMM算法调整源项目中实例的训练权重,并结合目标项目中的少量标注实例建立软件缺陷预测模型。实验结果表明,与经典的跨项目软件缺陷预测方法TCA、TNB和NNFilter相比,CLNI-KMM方法预测性能较优,并且具有较强的稳定性。 展开更多
关键词 跨项目缺陷预测 噪声数据 分布差异 实例过滤 实例迁移
下载PDF
跨项目缺陷预测中训练数据选择方法 被引量:3
14
作者 王星 何鹏 +1 位作者 陈丹 曾诚 《计算机应用》 CSCD 北大核心 2016年第11期3165-3169,3187,共6页
跨项目缺陷预测(CPDP)利用来自其他项目的缺陷数据预测目标项目的缺陷情况,为解决以往缺陷预测方法面临的训练数据受限问题提供了一个新的视角。训练数据的质量将直接影响跨项目缺陷预测模型的性能,因此,需尽可能选择与目标项目更相似... 跨项目缺陷预测(CPDP)利用来自其他项目的缺陷数据预测目标项目的缺陷情况,为解决以往缺陷预测方法面临的训练数据受限问题提供了一个新的视角。训练数据的质量将直接影响跨项目缺陷预测模型的性能,因此,需尽可能选择与目标项目更相似的数据用于模型的训练。利用PROMISE提供的34个公开数据集,从训练数据选择方面,分析了四种典型的相似性度量方法对跨项目预测结果的影响以及各种方法之间的差异。研究结果表明:使用不同的相似性度量方法选出的训练数据质量不同,其中余弦相似性与相关系数两种方法效果更好,且最大改进比例达到6.7%;同时,根据目标项目的缺陷率,发现余弦相似性更适合于缺陷率高于0.25的项目。 展开更多
关键词 软件质量保证 缺陷预测 跨项目缺陷预测 相似性度量 数据选择
下载PDF
上一页 1 下一页 到第
使用帮助 返回顶部