Internet-scale open source software (OSS) pro- duction in various communities generates abundant reusable resources for software developers. However, finding the de- sired and mature software with keyword queries fr...Internet-scale open source software (OSS) pro- duction in various communities generates abundant reusable resources for software developers. However, finding the de- sired and mature software with keyword queries from a considerable number of candidates, especially for the fresher, is a significant challenge because current search services often fail to understand the semantics of user queries. In this paper, we construct a software term database (STDB) by analyzing tagging data in Stack Overflow and propose a correlationbased software search (CBSS) approach that performs correlation retrieval based on the term relevance obtained from STDB. In addition, we design a novel ranking method to optimize the initial retrieval result. We explore four research questions in four experiments, respectively, to evaluate the effectiveness of the STDB and investigate the performance of the CBSS. The experiment results show that the proposed CBSS can effectively respond to keyword-based software searches and significantly outperforms other existing search services at finding mature software.展开更多
基金The research was supported by the National Natural Science Foundation of China (Grant Nos. 61432020, 61303064, 61472430, 61502512) and National Grand R&D Plan (2016YFB 1000805).
文摘Internet-scale open source software (OSS) pro- duction in various communities generates abundant reusable resources for software developers. However, finding the de- sired and mature software with keyword queries from a considerable number of candidates, especially for the fresher, is a significant challenge because current search services often fail to understand the semantics of user queries. In this paper, we construct a software term database (STDB) by analyzing tagging data in Stack Overflow and propose a correlationbased software search (CBSS) approach that performs correlation retrieval based on the term relevance obtained from STDB. In addition, we design a novel ranking method to optimize the initial retrieval result. We explore four research questions in four experiments, respectively, to evaluate the effectiveness of the STDB and investigate the performance of the CBSS. The experiment results show that the proposed CBSS can effectively respond to keyword-based software searches and significantly outperforms other existing search services at finding mature software.