In practice, some bugs have more impact than others and thus deserve more immediate attention. Due to tight schedule and limited human resources, developers may not have enough time to inspect all bugs. Thus, they oft...In practice, some bugs have more impact than others and thus deserve more immediate attention. Due to tight schedule and limited human resources, developers may not have enough time to inspect all bugs. Thus, they often concentrate on bugs that are highly impactful. In the literature, high-impact bugs are used to refer to the bugs which appear at unexpected time or locations and bring more unexpected effects (i.e., surprise bugs), or break pre-existing functionalities and destroy the user experience (i.e., breakage bugs). Unfortunately, identifying high-impact bugs from thousands of bug reports in a bug tracking system is not an easy feat. Thus, an automated technique that can identify high-impact bug reports can help developers to be aware of them early, rectify them quickly, and minimize the damages they cause. Considering that only a small proportion of bugs are high-impact bugs, the identification of high-impact bug reports is a difficult task. In this paper, we propose an approach to identify high-impact bug reports by leveraging imbalanced learning strategies. We investigate the effectiveness of various variants, each of which combines one particular imbalanced learning strategy and one particular classification algorithm. In particular, we choose four widely used strategies for dealing with imbalanced data and four state-of-the-art text classification algorithms to conduct experiments on four datasets from four different open source projects. We mainly perform an analytical study on two types of high-impact bugs, i.e., surprise bugs and breakage bugs. The results show that different variants have different performances, and the best performing variants SMOTE (synthetic minority over-sampling technique) + KNN (K-nearest neighbours) for surprise bug identification and RUS (random under-sampling) + NB (naive Bayes) for breakage bug identification outperform the Fl-scores of the two state-of-the-art approaches by Thung et al. and Garcia and Shihab.展开更多
The existing software bug localization models treat the source file as natural language, which leads to the loss of syntactical and structure information of the source file. A bug localization model based on syntactic...The existing software bug localization models treat the source file as natural language, which leads to the loss of syntactical and structure information of the source file. A bug localization model based on syntactical and semantic information of source code is proposed. Firstly, abstract syntax tree(AST) is divided based on node category to obtain statement sequence. The statement tree is encoded into vectors to capture lexical and syntactical knowledge at the statement level.Secondly, the source code is transformed into vector representation by the sequence naturalness of the statement. Therefore,the problem of gradient vanishing and explosion caused by a large AST size is obviated when using AST to the represent source code. Finally, the correlation between bug reports and source files are comprehensively analyzed from three aspects of syntax, semantics and text to locate the buggy code. Experiments show that compared with other standard models, the proposed model improves the performance of bug localization, and it has good advantages in mean reciprocal rank(MRR), mean average precision(MAP) and Top N Rank.展开更多
Automation software need to be continuously updated by addressing software bugs contained in their repositories.However,bugs have different levels of importance;hence,it is essential to prioritize bug reports based on...Automation software need to be continuously updated by addressing software bugs contained in their repositories.However,bugs have different levels of importance;hence,it is essential to prioritize bug reports based on their sever-ity and importance.Manually managing the deluge of incoming bug reports faces time and resource constraints from the development team and delays the resolu-tion of critical bugs.Therefore,bug report prioritization is vital.This study pro-poses a new model for bug prioritization based on average one dependence estimator;it prioritizes bug reports based on severity,which is determined by the number of attributes.The more the number of attributes,the more the severity.The proposed model is evaluated using precision,recall,F1-Score,accuracy,G-Measure,and Matthew’s correlation coefficient.Results of the proposed model are compared with those of the support vector machine(SVM)and Naive Bayes(NB)models.Eclipse and Mozilla datasetswere used as the sources of bug reports.The proposed model improved the bug repository management and out-performed the SVM and NB models.Additionally,the proposed model used a weaker attribute independence supposition than the former models,thereby improving prediction accuracy with minimal computational cost.展开更多
In software development projects,bugs are common phenomena.Developers report bugs in open source repositories.There is a need to develop high quality developer prediction model that considers developer work satisfacti...In software development projects,bugs are common phenomena.Developers report bugs in open source repositories.There is a need to develop high quality developer prediction model that considers developer work satisfaction,keep within limited development cost,and improve bug resolution time.To address and resolve bug report as soon as possible is the main focus of triager when a new bug is reported.Thus,developer work efficiency is an important factor in bug-fixing.To address these issues,a proposed approach recommends a set of developers that could potentially share their knowledge with each other to fix new bug reports.The proposed approach is called developer working efficiency and social network based developer recommendation(DweSn).It is a composite model that builds developers'profile by using developer average bug fixing time,work efficiency to fix variety of bugs,as well as the developer's social interactions with other developers.A similarity measure is applied between new bug and bugs in corpus to extract the list of capable developers from the corpus.The proposed approach only selects those developers who are active and less loaded with work.The developer with the highest profile score is assigned the bugs.We evaluated our approach on the subset of five large open-source projects including Mozilla,Netbeans,Eclipse,Firefox and OpenOffice,and compared it with the state-of-the-art.The results demonstrate that combination of developers'efficiency with their average bug fixing time and interactions in their social network gives good accuracy and efficiently reduces bug tossing length.This approach shows an improvement in prediction accuracy,precision,recall,F-score and reduced bug tossing length up to 93.89%,93.12%,93.46%,93.27%and 93.25%,respectively.The proposed approach achieved a 93%hit ratio and 93.34%mean reciprocal rank,indicating that our proposed triager is able to efficiently assign bugs to correct developers.展开更多
基金This work is supported by the National Natural Science Foundation of China under Grant Nos. 61602403 and 61402406 and the National Key Technology Research and Development Program of the Ministry of Science and Technology of China under Grant No. 2015BAH17F01.
文摘In practice, some bugs have more impact than others and thus deserve more immediate attention. Due to tight schedule and limited human resources, developers may not have enough time to inspect all bugs. Thus, they often concentrate on bugs that are highly impactful. In the literature, high-impact bugs are used to refer to the bugs which appear at unexpected time or locations and bring more unexpected effects (i.e., surprise bugs), or break pre-existing functionalities and destroy the user experience (i.e., breakage bugs). Unfortunately, identifying high-impact bugs from thousands of bug reports in a bug tracking system is not an easy feat. Thus, an automated technique that can identify high-impact bug reports can help developers to be aware of them early, rectify them quickly, and minimize the damages they cause. Considering that only a small proportion of bugs are high-impact bugs, the identification of high-impact bug reports is a difficult task. In this paper, we propose an approach to identify high-impact bug reports by leveraging imbalanced learning strategies. We investigate the effectiveness of various variants, each of which combines one particular imbalanced learning strategy and one particular classification algorithm. In particular, we choose four widely used strategies for dealing with imbalanced data and four state-of-the-art text classification algorithms to conduct experiments on four datasets from four different open source projects. We mainly perform an analytical study on two types of high-impact bugs, i.e., surprise bugs and breakage bugs. The results show that different variants have different performances, and the best performing variants SMOTE (synthetic minority over-sampling technique) + KNN (K-nearest neighbours) for surprise bug identification and RUS (random under-sampling) + NB (naive Bayes) for breakage bug identification outperform the Fl-scores of the two state-of-the-art approaches by Thung et al. and Garcia and Shihab.
基金supported by the National Key R&D Program of China (2018YFB1702700)。
文摘The existing software bug localization models treat the source file as natural language, which leads to the loss of syntactical and structure information of the source file. A bug localization model based on syntactical and semantic information of source code is proposed. Firstly, abstract syntax tree(AST) is divided based on node category to obtain statement sequence. The statement tree is encoded into vectors to capture lexical and syntactical knowledge at the statement level.Secondly, the source code is transformed into vector representation by the sequence naturalness of the statement. Therefore,the problem of gradient vanishing and explosion caused by a large AST size is obviated when using AST to the represent source code. Finally, the correlation between bug reports and source files are comprehensively analyzed from three aspects of syntax, semantics and text to locate the buggy code. Experiments show that compared with other standard models, the proposed model improves the performance of bug localization, and it has good advantages in mean reciprocal rank(MRR), mean average precision(MAP) and Top N Rank.
基金This work was supported in part by the National Research Foundation of Korea(NRF)grant funded by the Korea government(MSIT)(No.NRF-2020R1A2C1013308).
文摘Automation software need to be continuously updated by addressing software bugs contained in their repositories.However,bugs have different levels of importance;hence,it is essential to prioritize bug reports based on their sever-ity and importance.Manually managing the deluge of incoming bug reports faces time and resource constraints from the development team and delays the resolu-tion of critical bugs.Therefore,bug report prioritization is vital.This study pro-poses a new model for bug prioritization based on average one dependence estimator;it prioritizes bug reports based on severity,which is determined by the number of attributes.The more the number of attributes,the more the severity.The proposed model is evaluated using precision,recall,F1-Score,accuracy,G-Measure,and Matthew’s correlation coefficient.Results of the proposed model are compared with those of the support vector machine(SVM)and Naive Bayes(NB)models.Eclipse and Mozilla datasetswere used as the sources of bug reports.The proposed model improved the bug repository management and out-performed the SVM and NB models.Additionally,the proposed model used a weaker attribute independence supposition than the former models,thereby improving prediction accuracy with minimal computational cost.
文摘In software development projects,bugs are common phenomena.Developers report bugs in open source repositories.There is a need to develop high quality developer prediction model that considers developer work satisfaction,keep within limited development cost,and improve bug resolution time.To address and resolve bug report as soon as possible is the main focus of triager when a new bug is reported.Thus,developer work efficiency is an important factor in bug-fixing.To address these issues,a proposed approach recommends a set of developers that could potentially share their knowledge with each other to fix new bug reports.The proposed approach is called developer working efficiency and social network based developer recommendation(DweSn).It is a composite model that builds developers'profile by using developer average bug fixing time,work efficiency to fix variety of bugs,as well as the developer's social interactions with other developers.A similarity measure is applied between new bug and bugs in corpus to extract the list of capable developers from the corpus.The proposed approach only selects those developers who are active and less loaded with work.The developer with the highest profile score is assigned the bugs.We evaluated our approach on the subset of five large open-source projects including Mozilla,Netbeans,Eclipse,Firefox and OpenOffice,and compared it with the state-of-the-art.The results demonstrate that combination of developers'efficiency with their average bug fixing time and interactions in their social network gives good accuracy and efficiently reduces bug tossing length.This approach shows an improvement in prediction accuracy,precision,recall,F-score and reduced bug tossing length up to 93.89%,93.12%,93.46%,93.27%and 93.25%,respectively.The proposed approach achieved a 93%hit ratio and 93.34%mean reciprocal rank,indicating that our proposed triager is able to efficiently assign bugs to correct developers.