摘要
函数是大多数传统编程语言中聚合行为的最小命名单元,函数名的可读性对于程序员理解程序功能及不同模块之间的交互有着至关重要的作用,低质量的函数名会使开发人员感到困惑,增加代码中的坏味道,进而引发由API误用而导致的软件缺陷.为此,提出一种基于深度学习的函数名一致性检查及推荐方法,该方法被命名为DMName.首先,对于给定的目标函数源码,分别构建其内部上下文、交互上下文、兄弟上下文和封闭上下文,合并后得到上下文信息标记序列,然后利用FastText词嵌入技术将标记序列转换为上下文表示向量序列,输入到seq2seq模型编码器中,引入Copy机制和Coverage机制分别解决OOV问题和重复解码问题,输出目标函数名预测结果的向量序列,借助双通道CNN分类器进行函数名的一致性判断,若不一致则根据向量空间相似度匹配直接映射获得推荐的函数名.实验结果表明,DMName方法在函数名一致性检查任务和函数名推荐任务中的F1值分别达到82.65%和73.31%,比目前最优的DeepName方法分别提高2.01%和2.96%.最后,在GitHub大规模开源项目lancia中对DMName方法进行验证,挖掘得到16个函数名不一致问题并进行合理的名称推荐,进一步证实DMName方法的有效性.
The functions are the smallest naming unit of aggregation behavior in most traditional programming languages.The readability of function names plays a vital role in programmers’understanding of program functions and the interaction between different modules.Low-quality function names may confuse developers,increase the smell in the code,and then result in software defects caused by API misuse.Therefore,a method of function name consistency checking and recommendation based on deep learning is proposed,which is named DMName.Firstly,for the given source code of the target function,the internal context,interactive context,sibling context,and closed context are constructed respectively,and the context information tag sequence is obtained after merging them.Then the tag sequence is converted into the context representation vector sequence by using the word embedding technology FastText and input into the encoder of the seq2seq model.The copy mechanism and coverage mechanism are utilized to solve the OOV problem and the repeated decoding problem,respectively.Finally,the vector sequence of the prediction result of the target function name is output,and the consistency of the function name is predicted with the help of the two-channel CNN classifier.If the function name is inconsistent,the recommended function name can be obtained by direct mapping according to the vector space similarity matching.The experimental results show that the F1-measure of DMName in function name consistency check and recommendation reaches 82.65%and 73.31%respectively,which is 2.01%and 2.96%higher than the current optimal DeepName.Finally,the DMName is verified in the large-scale open-source project,namely lancia in GitHub.A total of 16 function name inconsistency problems are found,and reasonable name recommendations are made,which further confirms the effectiveness of DMName.
作者
郑炜
唐辉
陈翔
张永杰
ZHENG Wei;TANG Hui;CHEN Xiang;ZHANG Yong-Jie(School of Software,Northwestern Polytechnical University,Xi’an 710072,China;School of Information Science and Technology,Nantong University,Nantong 226019,China;National Engineering Laboratory for Integrated Aero-space-ground-ocean Big Data Application Technology(Northwestern Polytechnical University),Xi’an 710072,China;Key Laboratory of Big Data Storage and Management,Ministry of Industry and Information Technology(Northwestern Polytechnical University),Xi’an 710172,China)
出处
《软件学报》
EI
CSCD
北大核心
2024年第10期4604-4622,共19页
Journal of Software
基金
国家重点研发计划(2020YFC0833105Z1)
国家自然科学基金(62141208)。