In current software defect prediction (SDP) research, most previous empirical studies only use datasets provided by PROMISE repository and this may cause a threat to the external validity of previous empirical results...In current software defect prediction (SDP) research, most previous empirical studies only use datasets provided by PROMISE repository and this may cause a threat to the external validity of previous empirical results. Instead of SDP dataset sharing, SDP model sharing is a potential solution to alleviate this problem and can encourage researchers in the research community and practitioners in the industrial community to share more models. However, directly sharing models may result in privacy disclosure, such as model inversion attack. To the best of our knowledge, we are the first to apply differential privacy (DP) to privacy-preserving SDP model sharing and then propose a novel method DP-Share, since DP mechanisms can prevent this attack when the privacy budget is carefully selected. In particular, DP-Share first performs data preprocessing for the dataset, such as over-sampling for minority instances (i.e., defective modules) and conducting discretization for continuous features to optimize privacy budget allocation. Then, it uses a novel sampling strategy to create a set of training sets. Finally it constructs decision trees based on these training sets and these decision trees can form a random forest (i.e., model). The last phase of DP-Share uses Laplace and exponential mechanisms to satisfy the requirements of DP. In our empirical studies, we choose nine experimental subjects from real software projects. Then, we use AUC (area under ROC curve) as the performance measure and holdout as our model validation technique. After privacy and utility analysis, we find that DP-Share can achieve better performance than a baseline method DF-Enhance in most cases when using the same privacy budget. Moreover, we also provide guidelines to effectively use our proposed method. Our work attempts to fill the research gap in terms of differential privacy for SDP, which can encourage researchers and practitioners to share more SDP models and then effectively advance the state of the art of SDP.展开更多
With the widespread use of agile software development methods,such as agile and scrum,software is iteratively updated more frequently.To ensure the quality of the software,regression testing is conducted before new ve...With the widespread use of agile software development methods,such as agile and scrum,software is iteratively updated more frequently.To ensure the quality of the software,regression testing is conducted before new versions are released.Moreover,to improve the efficiency of regression testing,testing efforts should be concentrated on the modified and impacted parts of a program.However,the costs of manually constructing new test cases for the modified and impacted parts are relatively expensive.Fuzz testing is an effective method for generating test data automatically,but it is usually devoted to achieving higher code coverage,which makes fuzz testing unsuitable for direct regression testing scenarios.For this reason,we propose a fuzz testing method based on the guidance of historical version information.First,the differences between the program being tested and the last version are analyzed,and the results of the analysis are used to locate change points.Second,change impact analysis is performed to find the corresponding impacted basic blocks.Finally,the fitness values of test cases are calculated according to the execution traces,and new test cases are generated iteratively by the genetic algorithm.Based on the proposed method,we implement a prototype tool DeltaFuzz and conduct experiments on six open-source projects.Compared with the fuzzing tool AFLGo,AFLFast and AFL,DeltaFuzz can reach the target faster,and the time taken by DeltaFuzz was reduced by 20.59%,30.05%and 32.61%,respectively.展开更多
基金partially supported by the National Natural Science Foundation of China under Grant Nos. 61702041 and 61872263the Open Project of State Key Laboratory for Novel Software Technology at Nanjing University under Grant No. KFKT2019B14+2 种基金the Science and Technology Project of Beijing Municipal Education Commission under Grant No. KM201811232016the Nantong Application Research Plan under Grant No. JC2018134Jiangsu Government Scholarship for Overseas Studies.
文摘In current software defect prediction (SDP) research, most previous empirical studies only use datasets provided by PROMISE repository and this may cause a threat to the external validity of previous empirical results. Instead of SDP dataset sharing, SDP model sharing is a potential solution to alleviate this problem and can encourage researchers in the research community and practitioners in the industrial community to share more models. However, directly sharing models may result in privacy disclosure, such as model inversion attack. To the best of our knowledge, we are the first to apply differential privacy (DP) to privacy-preserving SDP model sharing and then propose a novel method DP-Share, since DP mechanisms can prevent this attack when the privacy budget is carefully selected. In particular, DP-Share first performs data preprocessing for the dataset, such as over-sampling for minority instances (i.e., defective modules) and conducting discretization for continuous features to optimize privacy budget allocation. Then, it uses a novel sampling strategy to create a set of training sets. Finally it constructs decision trees based on these training sets and these decision trees can form a random forest (i.e., model). The last phase of DP-Share uses Laplace and exponential mechanisms to satisfy the requirements of DP. In our empirical studies, we choose nine experimental subjects from real software projects. Then, we use AUC (area under ROC curve) as the performance measure and holdout as our model validation technique. After privacy and utility analysis, we find that DP-Share can achieve better performance than a baseline method DF-Enhance in most cases when using the same privacy budget. Moreover, we also provide guidelines to effectively use our proposed method. Our work attempts to fill the research gap in terms of differential privacy for SDP, which can encourage researchers and practitioners to share more SDP models and then effectively advance the state of the art of SDP.
基金supported by the Leading-Edge Technology Program of Jiangsu Natural Science Foundation of China under Grant No.BK20202001the National Natural Science Foundation of China under Grant No.61702041the Beijing Information Science and Technology University“Qin-Xin Talent”Cultivation Project under Grant No.QXTCP C201906.
文摘With the widespread use of agile software development methods,such as agile and scrum,software is iteratively updated more frequently.To ensure the quality of the software,regression testing is conducted before new versions are released.Moreover,to improve the efficiency of regression testing,testing efforts should be concentrated on the modified and impacted parts of a program.However,the costs of manually constructing new test cases for the modified and impacted parts are relatively expensive.Fuzz testing is an effective method for generating test data automatically,but it is usually devoted to achieving higher code coverage,which makes fuzz testing unsuitable for direct regression testing scenarios.For this reason,we propose a fuzz testing method based on the guidance of historical version information.First,the differences between the program being tested and the last version are analyzed,and the results of the analysis are used to locate change points.Second,change impact analysis is performed to find the corresponding impacted basic blocks.Finally,the fitness values of test cases are calculated according to the execution traces,and new test cases are generated iteratively by the genetic algorithm.Based on the proposed method,we implement a prototype tool DeltaFuzz and conduct experiments on six open-source projects.Compared with the fuzzing tool AFLGo,AFLFast and AFL,DeltaFuzz can reach the target faster,and the time taken by DeltaFuzz was reduced by 20.59%,30.05%and 32.61%,respectively.