摘要
在项目开发过程中,开发者需要为实现某一功能而编写代码;在不确定如何使用特定编程语言来实现当前待开发功能时,其往往会在文档或网络资源中进行代码查询。因此,代码查询的有效性会直接影响软件开发的效率。目前,已有相当数量的工具可以用来辅助开发者进行代码查询,但这些工具普遍存在输入形式复杂或者匹配精确度低等问题。文中提出的CodeSearcher是一种基于自然语言功能描述的代码查询方法。CodeSearcher将软件开发垂直领域的问答网站Stack OverFlow的问答记录转换为〈自然语言描述,代码片段〉数据对,使用神经网络模型将“自然语言描述”和“代码片段”映射到相同的向量空间并进行匹配,从而能够支持开发者使用待开发功能的自然语言描述来查询相应代码。CodeSearcher不同于一般的代码查询系统,一方面,它只需要代码本身而不依赖于代码的注释或说明,因此可以支持更多代码查询的场景;另一方面,它拓展了代码查询的流程,使其不再局限于一次性的查询反馈流程,而是在这中间加入了代码询答的流程,利用返回代码片段之间的差异性元素帮助开发者挑选目标代码,使得开发者不需要详细阅读所有返回的代码片段。实验结果表明,CodeSearcher相较于基准有着更好的效果。
When a developer is required to implement a function,but not knowing how to implement this function using a specific programming language,he/she usually needs to perform code query using natural language.It is time-consuming and labor-intensive to perform code query while programming.There have been bunch of code query tools proposed over the past years to assist developers,while most of the approaches require complex inputs or have low precision.We propose a new code query approach called CodeSearcher based on natural language description.Relying on the〈natural language description,code snippet〉data pairs extracted from Stack OverFlow,which is a software development related Q&A website,we design a neural network model and the corresponding training method to map“natural language description”and“code snippets”to the same vector space.CodeSearcher is different from the conventional code query systems.On the one hand,it accepts all kinds of user-provided code bases for searching,because the system only relies on the source codes without depending on the comments or description of the source codes;on the other hand,it no longer limits the form of code query process to“entering the natural language description and feeding back the code snippets”,but extends a code Q&A section,helping the users pick the appropriate code snippet by the characteristic key words,so that developers do not have to read all returned code snippets in detail.The experimental results show that CodeSearcher has high precision compared with the baseline.
作者
陆龙龙
陈统
潘敏学
张天
LU Long-long;CHEN Tong;PAN Min-xue;ZHANG Tian(State Key Laboratory for Novel Software Technology,Nanjing University,Nanjing 210023,China)
出处
《计算机科学》
CSCD
北大核心
2020年第9期1-9,共9页
Computer Science
基金
国家自然科学基金(61972193)
中央高校基本科研业务费专项资金(14380022,14380020)。