Recent studies have applied different approaches for summarizing software artifacts, and yet very few efforts have been made in summarizing the source code fragments available on web. This paper investigates the feasi...Recent studies have applied different approaches for summarizing software artifacts, and yet very few efforts have been made in summarizing the source code fragments available on web. This paper investigates the feasibility of generating code fragment summaries by using supervised learning algorithms. We hire a crowd of ten individuals from the same work place to extract source code features on a cor- pus of 127 code fragments retrieved from Eclipse and Net- Beans Official frequently asked questions (FAQs). Human an- notators suggest summary lines. Our machine learning algo- rithms produce better results with the precision of 82% and perform statistically better than existing code fragment classi- fiers. Evaluation of algorithms on several statistical measures endorses our result. This result is promising when employing mechanisms such as data-driven crowd enlistment improve the efficacy of existing code fragment classifiers.展开更多
Compilers are widely-used infrastructures in accelerating the software development,and expected to be trustworthy.In the literature,various testing technologies have been proposed to guarantee the quality of compilers...Compilers are widely-used infrastructures in accelerating the software development,and expected to be trustworthy.In the literature,various testing technologies have been proposed to guarantee the quality of compilers.However,there remains an obstacle to comprehensively characterize and understand compiler testing.To overcome this obstacle,we propose a literature analysis framework to gain insights into the compiler testing area.First,we perform an extensive search to construct a dataset related to compiler testing papers.Then,we conduct a bibliometric analysis to analyze the productive authors,the influential papers,and the frequently tested compilers based on our dataset.Finally,we utilize association rules and collaboration networks to mine the authorships and the communities of interests among researchers and keywords.Some valuable results are reported.We find that the USA is the leading country that contains the most influential researchers and institutions.The most active keyword is“random testing”.We also find that most researchers have broad interests within small-scale collaborators in the compiler testing area.展开更多
基金We would like to extend our gratitude to the individu- als who dedicated their time and effort to participate in crowdsourcing activ- ity and annotation of our code fragment corpus. This work was supported in part by National Program on Key Basic Research Project (2013CB035906), in part by the New Century Excellent Talents in University (NCET-13-0073), and in part by the National Natural Science Foundation of China (Grant Nos. 61175062, 61370144).
文摘Recent studies have applied different approaches for summarizing software artifacts, and yet very few efforts have been made in summarizing the source code fragments available on web. This paper investigates the feasibility of generating code fragment summaries by using supervised learning algorithms. We hire a crowd of ten individuals from the same work place to extract source code features on a cor- pus of 127 code fragments retrieved from Eclipse and Net- Beans Official frequently asked questions (FAQs). Human an- notators suggest summary lines. Our machine learning algo- rithms produce better results with the precision of 82% and perform statistically better than existing code fragment classi- fiers. Evaluation of algorithms on several statistical measures endorses our result. This result is promising when employing mechanisms such as data-driven crowd enlistment improve the efficacy of existing code fragment classifiers.
基金We would like to thank all the participants for the comments on improving this paper.This research was supported by the National Key Research and Development Program of China(2018YFB1003900)the National Natural Science Foundation of China(Grant Nos.61722202,61772107 and 61572097)the Fundamental Research Funds for the Central Universities(DUT18JC08).
文摘Compilers are widely-used infrastructures in accelerating the software development,and expected to be trustworthy.In the literature,various testing technologies have been proposed to guarantee the quality of compilers.However,there remains an obstacle to comprehensively characterize and understand compiler testing.To overcome this obstacle,we propose a literature analysis framework to gain insights into the compiler testing area.First,we perform an extensive search to construct a dataset related to compiler testing papers.Then,we conduct a bibliometric analysis to analyze the productive authors,the influential papers,and the frequently tested compilers based on our dataset.Finally,we utilize association rules and collaboration networks to mine the authorships and the communities of interests among researchers and keywords.Some valuable results are reported.We find that the USA is the leading country that contains the most influential researchers and institutions.The most active keyword is“random testing”.We also find that most researchers have broad interests within small-scale collaborators in the compiler testing area.