期刊文献+

基于软件知识图谱的代码语义标签自动生成方法 被引量:1

Automatic Code Semantic Tag Generation Approach Based on Software Knowledge Graph
下载PDF
导出
摘要 开源及企业软件项目和各类软件开发网站上的代码片段是重要的软件开发资源.然而,很多开发者代码搜索需求反映的代码的高层意图和主题难以通过基于代码文本的信息检索技术来实现精准的代码搜索.因此,反映代码整体意图和主题的语义标签对于改进代码搜索、辅助代码理解都具有十分重要的作用.现有的标签生成技术主要面向文本内容或依赖于历史数据,无法满足大范围代码语义标注和辅助搜索、理解的需要.针对这一问题,提出了一种基于知识图谱的代码语义标签自动生成方法KGCodeTagger.该方法通过基于API文档和软件开发问答文本的概念和关系抽取构造软件知识图谱,作为代码语义标签生成的基础.针对给定的代码,该方法识别并抽取出通用API调用或概念提及,并链接到软件知识图谱中的相关概念上.在此基础上,该方法进一步识别与所链接的概念相关的其他概念作为候选,然后按照多样性和代表性排序,产生最终的代码语义标签.通过实验对KGCodeTagger软件知识图谱构建的各个步骤进行了评估,并通过与几个已有的基准方法的比较,对所生成的代码语义标签质量进行了评估.实验结果表明,KGCodeTagger的软件知识图谱构建步骤是合理有效的,该方法所生成的代码语义标签是高质量、有意义的,能够帮助开发人员快速理解代码的意图. Code snippets in open-source and enterprise software projects and posted on various software development websites are important software development resources.However,developer’s needs for code search often reflect high-level intentions and topics,which are difficult to be satisfied through code search techniques based on information retrieval.It is thus highly desirable that code snippets can be accompanied with semantic tags reflecting their high-level intentions and topics to facilitate code search and understanding.Existing tag generation technologies are mainly oriented to text content or rely on historical data,and cannot meet the needs of large-scale code semantic annotation and auxiliary code search and understanding.Targeted at the issue,this study proposes an approach based on software knowledge graph(called KGCodeTagger)that automatically generates semantic tags for code snippets.KGCodeTagger constructs a software knowledge graph based on concepts and relations extracted from API documentations and software development Q&A text and uses the knowledge graph as the basis of code semantic tag generation.Given a code snippet,KGCodeTagger identifies and extracts API invocations and concept mentions,and then links them to the corresponding concepts in the software knowledge graph.On this basis,the approach further identifies other concepts related to the linked concepts as candidates and selects semantic tags from relevant concepts based on the diversity and representativeness.The software knowledge graph construction steps of KGCodeTagger and the quality of the generated code tags are evaluated.The results show that KGCodeTagger can produce high-quality and meaningful software knowledge graph and code semantic tags,which can help developers quickly understand the intention of the code.
作者 邢双双 刘名威 彭鑫 XING Shuang-Shuang;LIU Ming-Wei;PENG Xin(School of Computer Science,Fudan University,Shanghai 201203,China;Shanghai Key Laboratory of Data Science(Fudan University),Shanghai 201203,China)
出处 《软件学报》 EI CSCD 北大核心 2022年第11期4027-4045,共19页 Journal of Software
基金 国家自然科学基金(61972098)。
关键词 程序理解 代码搜索 知识图谱 语义标签 program comprehension code search knowledge graph semantic tag
  • 相关文献

参考文献2

二级参考文献6

共引文献26

同被引文献8

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部