Cross-Project Defect Prediction(CPDP)is a method that utilizes historical data from other source projects to train predictive models for defect prediction in the target project.However,existing CPDP methods only consi...Cross-Project Defect Prediction(CPDP)is a method that utilizes historical data from other source projects to train predictive models for defect prediction in the target project.However,existing CPDP methods only consider linear correlations between features(indicators)of the source and target projects.These models are not capable of evaluating non-linear correlations between features when they exist,for example,when there are differences in data distributions between the source and target projects.As a result,the performance of such CPDP models is compromised.In this paper,this paper proposes a novel CPDP method based on Synthetic Minority Oversampling Technique(SMOTE)and Deep Canonical Correlation Analysis(DCCA),referred to as S-DCCA.Canonical Correlation Analysis(CCA)is employed to address the issue of non-linear correlations between features of the source and target projects.S-DCCA extends CCA by incorporating the MlpNet model for feature extraction from the dataset.The redundant features are then eliminated by maximizing the correlated feature subset using the CCA loss function.Finally,cross-project defect prediction is achieved through the application of the SMOTE data sampling technique.Area Under Curve(AUC)and F1 scores(F1)are used as evaluation metrics.This paper conducted experiments on 27 projects from four public datasets to validate the proposed method.The results demonstrate that,on average,our method outperforms all baseline approaches by at least 1.2%in AUC and 5.5%in F1 score.This indicates that the proposed method exhibits favorable performance characteristics.展开更多
Cross-project software defect prediction(CPDP)aims to enhance defect prediction in target projects with limited or no historical data by leveraging information from related source projects.The existing CPDP approaches...Cross-project software defect prediction(CPDP)aims to enhance defect prediction in target projects with limited or no historical data by leveraging information from related source projects.The existing CPDP approaches rely on static metrics or dynamic syntactic features,which have shown limited effectiveness in CPDP due to their inability to capture higher-level system properties,such as complex design patterns,relationships between multiple functions,and dependencies in different software projects,that are important for CPDP.This paper introduces a novel approach,a graph-based feature learning model for CPDP(GB-CPDP),that utilizes NetworkX to extract features and learn representations of program entities from control flow graphs(CFGs)and data dependency graphs(DDGs).These graphs capture the structural and data dependencies within the source code.The proposed approach employs Node2Vec to transform CFGs and DDGs into numerical vectors and leverages Long Short-Term Memory(LSTM)networks to learn predictive models.The process involves graph construction,feature learning through graph embedding and LSTM,and defect prediction.Experimental evaluation using nine open-source Java projects from the PROMISE dataset demonstrates that GB-CPDP outperforms state-of-the-art CPDP methods in terms of F1-measure and Area Under the Curve(AUC).The results showcase the effectiveness of GB-CPDP in improving the performance of cross-project defect prediction.展开更多
With the continuous expansion of software scale,software update and maintenance have become more and more important.However,frequent software code updates will make the software more likely to introduce new defects.So...With the continuous expansion of software scale,software update and maintenance have become more and more important.However,frequent software code updates will make the software more likely to introduce new defects.So how to predict the defects quickly and accurately on the software change has become an important problem for software developers.Current defect prediction methods often cannot reflect the feature information of the defect comprehensively,and the detection effect is not ideal enough.Therefore,we propose a novel defect prediction model named ITNB(Improved Transfer Naive Bayes)based on improved transfer Naive Bayesian algorithm in this paper,which mainly considers the following two aspects:(1)Considering that the edge data of the test set may affect the similarity calculation and final prediction result,we remove the edge data of the test set when calculating the data similarity between the training set and the test set;(2)Considering that each feature dimension has different effects on defect prediction,we construct the calculation formula of training data weight based on feature dimension weight and data gravity,and then calculate the prior probability and the conditional probability of training data from the weight information,so as to construct the weighted bayesian classifier for software defect prediction.To evaluate the performance of the ITNB model,we use six datasets from large open source projects,namely Bugzilla,Columba,Mozilla,JDT,Platform and PostgreSQL.We compare the ITNB model with the transfer Naive Bayesian(TNB)model.The experimental results show that our ITNB model can achieve better results than the TNB model in terms of accurary,precision and pd for within-project and cross-project defect prediction.展开更多
:Cross-project defect prediction(CPDP)aims to predict the defects on target project by using a prediction model built on source projects.The main problem in CPDP is the huge distribution gap between the source project...:Cross-project defect prediction(CPDP)aims to predict the defects on target project by using a prediction model built on source projects.The main problem in CPDP is the huge distribution gap between the source project and the target project,which prevents the prediction model from performing well.Most existing methods overlook the class discrimination of the learned features.Seeking an effective transferable model from the source project to the target project for CPDP is challenging.In this paper,we propose an unsupervised domain adaptation based on the discriminative subspace learning(DSL)approach for CPDP.DSL treats the data from two projects as being from two domains and maps the data into a common feature space.It employs crossdomain alignment with discriminative information from different projects to reduce the distribution difference of the data between different projects and incorporates the class discriminative information.Specifically,DSL first utilizes subspace learning based domain adaptation to reduce the distribution gap of data between different projects.Then,it makes full use of the class label information of the source project and transfers the discrimination ability of the source project to the target project in the common space.Comprehensive experiments on five projects verify that DSL can build an effective prediction model and improve the performance over the related competing methods by at least 7.10%and 11.08%in terms of G-measure and AUC.展开更多
Software Defect Prediction(SDP) technology is an effective tool for improving software system quality that has attracted much attention in recent years.However,the prediction of cross-project data remains a challenge ...Software Defect Prediction(SDP) technology is an effective tool for improving software system quality that has attracted much attention in recent years.However,the prediction of cross-project data remains a challenge for the traditional SDP method due to the different distributions of the training and testing datasets.Another major difficulty is the class imbalance issue that must be addressed in Cross-Project Defect Prediction(CPDP).In this work,we propose a transfer-leaning algorithm(TSboostDF) that considers both knowledge transfer and class imbalance for CPDP.The experimental results demonstrate that the performance achieved by TSboostDF is better than those of existing CPDP methods.展开更多
As the boom of mobile devices,Android mobile apps play an irreplaceable roles in people’s daily life,which have the characteristics of frequent updates involving in many code commits to meet new requirements.Just-in-...As the boom of mobile devices,Android mobile apps play an irreplaceable roles in people’s daily life,which have the characteristics of frequent updates involving in many code commits to meet new requirements.Just-in-Time(JIT)defect prediction aims to identify whether the commit instances will bring defects into the new release of apps and provides immediate feedback to developers,which is more suitable to mobile apps.As the within-app defect prediction needs sufficient historical data to label the commit instances,which is inadequate in practice,one alternative method is to use the cross-project model.In this work,we propose a novel method,called KAL,for cross-project JIT defect prediction task in the context of Android mobile apps.More specifically,KAL first transforms the commit instances into a high-dimensional feature space using kernel-based principal component analysis technique to obtain the representative features.Then,the adversarial learning technique is used to extract the common feature embedding for the model building.We conduct experiments on 14 Android mobile apps and employ four effort-aware indicators for performance evaluation.The results on 182 cross-project pairs demonstrate that our proposed KAL method obtains better performance than 20 comparative methods.展开更多
Cross-project defect prediction(CPDP) uses one or more source projects to build a defect prediction model and applies the model to the target project. There is usually a big difference between the data distribution of...Cross-project defect prediction(CPDP) uses one or more source projects to build a defect prediction model and applies the model to the target project. There is usually a big difference between the data distribution of the source project and the target project, which makes it difficult to construct an effective defect prediction model. In order to alleviate the problem of negative migration between the source project and the target project in CPDP, this paper proposes an integrated transfer adaptive boosting(TrAdaBoost) algorithm based on multi-source data sets(MSITrA). The algorithm uses an existing two-stage data filtering algorithm to obtain source project data related to the target project from multiple source items, and then uses the integrated TrAdaBoost algorithm proposed in the paper to build a CPDP model. The experimental results of Promise’s 15 public data sets show that: 1) The cross-project software defect prediction model proposed in this paper has better performance in all tested CPDP methods;2) In the within-project software defect prediction(WPDP) experiment, the proposed CPDP method has achieved the better experimental results than the tested WPDP method.展开更多
基金NationalNatural Science Foundation of China,Grant/AwardNumber:61867004National Natural Science Foundation of China Youth Fund,Grant/Award Number:41801288.
文摘Cross-Project Defect Prediction(CPDP)is a method that utilizes historical data from other source projects to train predictive models for defect prediction in the target project.However,existing CPDP methods only consider linear correlations between features(indicators)of the source and target projects.These models are not capable of evaluating non-linear correlations between features when they exist,for example,when there are differences in data distributions between the source and target projects.As a result,the performance of such CPDP models is compromised.In this paper,this paper proposes a novel CPDP method based on Synthetic Minority Oversampling Technique(SMOTE)and Deep Canonical Correlation Analysis(DCCA),referred to as S-DCCA.Canonical Correlation Analysis(CCA)is employed to address the issue of non-linear correlations between features of the source and target projects.S-DCCA extends CCA by incorporating the MlpNet model for feature extraction from the dataset.The redundant features are then eliminated by maximizing the correlated feature subset using the CCA loss function.Finally,cross-project defect prediction is achieved through the application of the SMOTE data sampling technique.Area Under Curve(AUC)and F1 scores(F1)are used as evaluation metrics.This paper conducted experiments on 27 projects from four public datasets to validate the proposed method.The results demonstrate that,on average,our method outperforms all baseline approaches by at least 1.2%in AUC and 5.5%in F1 score.This indicates that the proposed method exhibits favorable performance characteristics.
基金supported by Institute of Information&Communications Technology Planning&Evaluation(IITP)grant funded by the Korea government(MSIT)(No.RS-2022-00155885).
文摘Cross-project software defect prediction(CPDP)aims to enhance defect prediction in target projects with limited or no historical data by leveraging information from related source projects.The existing CPDP approaches rely on static metrics or dynamic syntactic features,which have shown limited effectiveness in CPDP due to their inability to capture higher-level system properties,such as complex design patterns,relationships between multiple functions,and dependencies in different software projects,that are important for CPDP.This paper introduces a novel approach,a graph-based feature learning model for CPDP(GB-CPDP),that utilizes NetworkX to extract features and learn representations of program entities from control flow graphs(CFGs)and data dependency graphs(DDGs).These graphs capture the structural and data dependencies within the source code.The proposed approach employs Node2Vec to transform CFGs and DDGs into numerical vectors and leverages Long Short-Term Memory(LSTM)networks to learn predictive models.The process involves graph construction,feature learning through graph embedding and LSTM,and defect prediction.Experimental evaluation using nine open-source Java projects from the PROMISE dataset demonstrates that GB-CPDP outperforms state-of-the-art CPDP methods in terms of F1-measure and Area Under the Curve(AUC).The results showcase the effectiveness of GB-CPDP in improving the performance of cross-project defect prediction.
基金This work is supported in part by the National Science Foundation of China(Nos.61672392,61373038)in part by the National Key Research and Development Program of China(No.2016YFC1202204).
文摘With the continuous expansion of software scale,software update and maintenance have become more and more important.However,frequent software code updates will make the software more likely to introduce new defects.So how to predict the defects quickly and accurately on the software change has become an important problem for software developers.Current defect prediction methods often cannot reflect the feature information of the defect comprehensively,and the detection effect is not ideal enough.Therefore,we propose a novel defect prediction model named ITNB(Improved Transfer Naive Bayes)based on improved transfer Naive Bayesian algorithm in this paper,which mainly considers the following two aspects:(1)Considering that the edge data of the test set may affect the similarity calculation and final prediction result,we remove the edge data of the test set when calculating the data similarity between the training set and the test set;(2)Considering that each feature dimension has different effects on defect prediction,we construct the calculation formula of training data weight based on feature dimension weight and data gravity,and then calculate the prior probability and the conditional probability of training data from the weight information,so as to construct the weighted bayesian classifier for software defect prediction.To evaluate the performance of the ITNB model,we use six datasets from large open source projects,namely Bugzilla,Columba,Mozilla,JDT,Platform and PostgreSQL.We compare the ITNB model with the transfer Naive Bayesian(TNB)model.The experimental results show that our ITNB model can achieve better results than the TNB model in terms of accurary,precision and pd for within-project and cross-project defect prediction.
基金This paper was supported by the National Natural Science Foundation of China(61772286,61802208,and 61876089)China Postdoctoral Science Foundation Grant 2019M651923Natural Science Foundation of Jiangsu Province of China(BK0191381).
文摘:Cross-project defect prediction(CPDP)aims to predict the defects on target project by using a prediction model built on source projects.The main problem in CPDP is the huge distribution gap between the source project and the target project,which prevents the prediction model from performing well.Most existing methods overlook the class discrimination of the learned features.Seeking an effective transferable model from the source project to the target project for CPDP is challenging.In this paper,we propose an unsupervised domain adaptation based on the discriminative subspace learning(DSL)approach for CPDP.DSL treats the data from two projects as being from two domains and maps the data into a common feature space.It employs crossdomain alignment with discriminative information from different projects to reduce the distribution difference of the data between different projects and incorporates the class discriminative information.Specifically,DSL first utilizes subspace learning based domain adaptation to reduce the distribution gap of data between different projects.Then,it makes full use of the class label information of the source project and transfers the discrimination ability of the source project to the target project in the common space.Comprehensive experiments on five projects verify that DSL can build an effective prediction model and improve the performance over the related competing methods by at least 7.10%and 11.08%in terms of G-measure and AUC.
基金supported by the Army Weapons and Equipment Internal Research (No. LJ20191C080690)。
文摘Software Defect Prediction(SDP) technology is an effective tool for improving software system quality that has attracted much attention in recent years.However,the prediction of cross-project data remains a challenge for the traditional SDP method due to the different distributions of the training and testing datasets.Another major difficulty is the class imbalance issue that must be addressed in Cross-Project Defect Prediction(CPDP).In this work,we propose a transfer-leaning algorithm(TSboostDF) that considers both knowledge transfer and class imbalance for CPDP.The experimental results demonstrate that the performance achieved by TSboostDF is better than those of existing CPDP methods.
基金supported by the National Natural Science Foundation of China (Grant No.62072060).
文摘As the boom of mobile devices,Android mobile apps play an irreplaceable roles in people’s daily life,which have the characteristics of frequent updates involving in many code commits to meet new requirements.Just-in-Time(JIT)defect prediction aims to identify whether the commit instances will bring defects into the new release of apps and provides immediate feedback to developers,which is more suitable to mobile apps.As the within-app defect prediction needs sufficient historical data to label the commit instances,which is inadequate in practice,one alternative method is to use the cross-project model.In this work,we propose a novel method,called KAL,for cross-project JIT defect prediction task in the context of Android mobile apps.More specifically,KAL first transforms the commit instances into a high-dimensional feature space using kernel-based principal component analysis technique to obtain the representative features.Then,the adversarial learning technique is used to extract the common feature embedding for the model building.We conduct experiments on 14 Android mobile apps and employ four effort-aware indicators for performance evaluation.The results on 182 cross-project pairs demonstrate that our proposed KAL method obtains better performance than 20 comparative methods.
文摘Cross-project defect prediction(CPDP) uses one or more source projects to build a defect prediction model and applies the model to the target project. There is usually a big difference between the data distribution of the source project and the target project, which makes it difficult to construct an effective defect prediction model. In order to alleviate the problem of negative migration between the source project and the target project in CPDP, this paper proposes an integrated transfer adaptive boosting(TrAdaBoost) algorithm based on multi-source data sets(MSITrA). The algorithm uses an existing two-stage data filtering algorithm to obtain source project data related to the target project from multiple source items, and then uses the integrated TrAdaBoost algorithm proposed in the paper to build a CPDP model. The experimental results of Promise’s 15 public data sets show that: 1) The cross-project software defect prediction model proposed in this paper has better performance in all tested CPDP methods;2) In the within-project software defect prediction(WPDP) experiment, the proposed CPDP method has achieved the better experimental results than the tested WPDP method.