As the popularity of open source projects,the volume of incoming pull requests is too large,which puts heavy burden on integrators who are responsible for accepting or rejecting pull requests.An accepted pull request ...As the popularity of open source projects,the volume of incoming pull requests is too large,which puts heavy burden on integrators who are responsible for accepting or rejecting pull requests.An accepted pull request prediction approach can help integrators by allowing them either to enforce an immediate rejection of code changes or allocate more resources to overcome the deficiency.In this paper,an approach CTCPPre is proposed to predict the accepted pull requests in GitHub.CTCPPre mainly considers code features of modified changes,text features of pull requests’description,contributor features of developers’previous behaviors,and project features of development environment.The effectiveness of CTCPPre on 28 projects containing 221096 pull requests is evaluated.Experimental results show that CTCPPre has good performances by achieving accuracy of 0.82,AUC of 0.76 and F1-score of 0.88 on average.It is compared with the state of art accepted pull request prediction approach RFPredict.On average across 28 projects,CTCPPre outperforms RFPredict by 6.64%,16.06%and 4.79%in terms of accuracy,AUC and F1-score,respectively.展开更多
Code review is an important process to reduce code defects and improve software quality. In social coding communities like GitHub, as everyone can submit Pull-Requests, code review plays a more important role than eve...Code review is an important process to reduce code defects and improve software quality. In social coding communities like GitHub, as everyone can submit Pull-Requests, code review plays a more important role than ever before, and the process is quite time-consuming. Therefore, finding and recommending proper reviewers for the emerging Pull-Requests becomes a vital task. However, most of the current studies mainly focus on recommending reviewers by checking whether they will participate or not without differentiating the participation types. In this paper, we develop a two-layer reviewer recommendation model to recommend reviewers for Pull-Requests (PRs) in GitHub projects from the technical and managerial perspectives. For the first layer, we recommend suitable developers to review the target PRs based on a hybrid recommendation method. For the second layer, after getting the recommendation results from the first layer, we specify whether the target developer will technically or managerially participate in the reviewing process. We conducted experiments on two popular projects in GitHub, and tested the approach using PRs created between February 2016 and February 2017. The results show that the first layer of our recommendation model performs better than the previous work, and the second layer can effectively differentiate the types of participation.展开更多
Currently, open-source software is gradually being integrated into industrial software, while industry protocolsin industrial software are also gradually transferred to open-source community development. Industrial pr...Currently, open-source software is gradually being integrated into industrial software, while industry protocolsin industrial software are also gradually transferred to open-source community development. Industrial protocolstandardization organizations are confronted with fragmented and numerous code PR (Pull Request) and informalproposals, and differentworkflowswill lead to increased operating costs. The open-source community maintenanceteam needs software that is more intelligent to guide the identification and classification of these issues. To solvethe above problems, this paper proposes a PR review prediction model based on multi-dimensional features. Weextract 43 features of PR and divide them into five dimensions: contributor, reviewer, software project, PR, andsocial network of developers. The model integrates the above five-dimensional features, and a prediction model isbuilt based on a Random Forest Classifier to predict the review results of PR. On the other hand, to improve thequality of rejected PRs, we focus on problems raised in the review process and review comments of similar PRs.Wepropose a PR revision recommendation model based on the PR review knowledge graph. Entity information andrelationships between entities are extracted from text and code information of PRs, historical review comments,and related issues. PR revisions will be recommended to code contributors by graph-based similarity calculation.The experimental results illustrate that the above twomodels are effective and robust in PR review result predictionand PR revision recommendation.展开更多
Code reviews in pull-based model are open to community users on GitHub. Various participants are taking part in the review discussions and the review topics are not only about the improvement of code contributions but...Code reviews in pull-based model are open to community users on GitHub. Various participants are taking part in the review discussions and the review topics are not only about the improvement of code contributions but also about project evolution and social interaction. A comprehensive understanding of the review topics in pull-based model would be useful to better organize the code review process and optimize review tasks such as reviewer recommendation and pull-request prioritization. In this paper, we first conduct a qualitative study on three popular open-source software projects hosted on GitHub and construct a fine-grained two-level taxonomy covering four level-1 categories (code correctness, pull- request decision-making, project management, and social interaction) and 11 level-2 subcategories (e.g., defect detecting, reviewer assigning, contribution encouraging). Second, we conduct preliminary quantitative analysis on a large set of review comments that were labeled by TSHC (a two-stage hybrid classification algorithm), which is able to automatically classify review comments by combining rule-based and machine-learning techniques. Through the quantitative study, we explore the typical review patterns. We find that the three projects present similar comments distribution on each subeategory. Pull-requests submitted by inexperienced contributors tend to contain potential issues even though they have passed the tests. Furthermore, external contributors are more likely to break project conventions in their early contributions.展开更多
基金Project(2018YFB1004202)supported by the National Key Research and Development Program of ChinaProject(61732019)supported by the National Natural Science Foundation of ChinaProject(SKLSDE-2018ZX-06)supported by the State Key Laboratory of Software Development Environment,China
文摘As the popularity of open source projects,the volume of incoming pull requests is too large,which puts heavy burden on integrators who are responsible for accepting or rejecting pull requests.An accepted pull request prediction approach can help integrators by allowing them either to enforce an immediate rejection of code changes or allocate more resources to overcome the deficiency.In this paper,an approach CTCPPre is proposed to predict the accepted pull requests in GitHub.CTCPPre mainly considers code features of modified changes,text features of pull requests’description,contributor features of developers’previous behaviors,and project features of development environment.The effectiveness of CTCPPre on 28 projects containing 221096 pull requests is evaluated.Experimental results show that CTCPPre has good performances by achieving accuracy of 0.82,AUC of 0.76 and F1-score of 0.88 on average.It is compared with the state of art accepted pull request prediction approach RFPredict.On average across 28 projects,CTCPPre outperforms RFPredict by 6.64%,16.06%and 4.79%in terms of accuracy,AUC and F1-score,respectively.
基金Project(2016-YFB1000805)supported by the National Grand R&D Plan,ChinaProjects(61502512,61432020,61472430,61532004)supported by the National Natural Science Foundation of China
文摘Code review is an important process to reduce code defects and improve software quality. In social coding communities like GitHub, as everyone can submit Pull-Requests, code review plays a more important role than ever before, and the process is quite time-consuming. Therefore, finding and recommending proper reviewers for the emerging Pull-Requests becomes a vital task. However, most of the current studies mainly focus on recommending reviewers by checking whether they will participate or not without differentiating the participation types. In this paper, we develop a two-layer reviewer recommendation model to recommend reviewers for Pull-Requests (PRs) in GitHub projects from the technical and managerial perspectives. For the first layer, we recommend suitable developers to review the target PRs based on a hybrid recommendation method. For the second layer, after getting the recommendation results from the first layer, we specify whether the target developer will technically or managerially participate in the reviewing process. We conducted experiments on two popular projects in GitHub, and tested the approach using PRs created between February 2016 and February 2017. The results show that the first layer of our recommendation model performs better than the previous work, and the second layer can effectively differentiate the types of participation.
基金support of National Social Science Fund(NSSF)under Grant(No.22BTQ033).
文摘Currently, open-source software is gradually being integrated into industrial software, while industry protocolsin industrial software are also gradually transferred to open-source community development. Industrial protocolstandardization organizations are confronted with fragmented and numerous code PR (Pull Request) and informalproposals, and differentworkflowswill lead to increased operating costs. The open-source community maintenanceteam needs software that is more intelligent to guide the identification and classification of these issues. To solvethe above problems, this paper proposes a PR review prediction model based on multi-dimensional features. Weextract 43 features of PR and divide them into five dimensions: contributor, reviewer, software project, PR, andsocial network of developers. The model integrates the above five-dimensional features, and a prediction model isbuilt based on a Random Forest Classifier to predict the review results of PR. On the other hand, to improve thequality of rejected PRs, we focus on problems raised in the review process and review comments of similar PRs.Wepropose a PR revision recommendation model based on the PR review knowledge graph. Entity information andrelationships between entities are extracted from text and code information of PRs, historical review comments,and related issues. PR revisions will be recommended to code contributors by graph-based similarity calculation.The experimental results illustrate that the above twomodels are effective and robust in PR review result predictionand PR revision recommendation.
基金This work was supported by the National Key Research and Development Program of China under Grant No. 2016YFB1000805 and the National Natural Science Foundation of China under Grant Nos. 61432020, 61303064, 61472430 and 61502512.
文摘Code reviews in pull-based model are open to community users on GitHub. Various participants are taking part in the review discussions and the review topics are not only about the improvement of code contributions but also about project evolution and social interaction. A comprehensive understanding of the review topics in pull-based model would be useful to better organize the code review process and optimize review tasks such as reviewer recommendation and pull-request prioritization. In this paper, we first conduct a qualitative study on three popular open-source software projects hosted on GitHub and construct a fine-grained two-level taxonomy covering four level-1 categories (code correctness, pull- request decision-making, project management, and social interaction) and 11 level-2 subcategories (e.g., defect detecting, reviewer assigning, contribution encouraging). Second, we conduct preliminary quantitative analysis on a large set of review comments that were labeled by TSHC (a two-stage hybrid classification algorithm), which is able to automatically classify review comments by combining rule-based and machine-learning techniques. Through the quantitative study, we explore the typical review patterns. We find that the three projects present similar comments distribution on each subeategory. Pull-requests submitted by inexperienced contributors tend to contain potential issues even though they have passed the tests. Furthermore, external contributors are more likely to break project conventions in their early contributions.