期刊文献+

CodeScore-R:用于评估代码合成功能准确性的自动化鲁棒指标

CodeScore-R:An Automated Robustness Metric for Assessing the Functional Correctness of Code Synthesis
下载PDF
导出
摘要 评估指标在代码合成领域中至关重要.常用的代码评估指标可以分为3种类型:基于匹配、基于语义和基于执行.其中,基于执行的Pass@k指标通过执行测试用例,能够准确判断预测代码的功能准确性.然而,该指标的计算需要大量开销,因此亟需设计一种自动化评估指标,在无需测试用例时仍可评估预测代码的功能准确性.此外,好的评估指标应当具有鲁棒性,即预测代码发生微小改变时,评估指标仍能保持其准确性.为此,提出了一种基于UniXcoder和对比学习的自动化鲁棒指标CodeScore-R,用于评估代码合成的功能准确性. CodeScore-R采用草图化处理、语法等价转换和变异测试等技术手段,有效减轻了标识符、语法结构和运算符对评估结果的干扰.实验结果表明,在Java和Python语言上的代码生成和迁移任务中,CodeScore-R的表现优于其他无需测试用例的评估指标,且更接近Pass@k指标,并具有更强的鲁棒性. Evaluation metrics are crucial in the field of code synthesis.Commonly used code evaluation metrics can be classified into three types:match-based,semantic-based,and execution-based.Among them,the execution-based Pass@k metric accurately assesses the functionality of predicted code by executing test cases.However,calculating this metric requires a significant amount of overhead,necessitating the design of an automated evaluation metric that can assess the functionality of predicted code without the need for test cases.Additionally,a good evaluation metric should be robust,that is the metric can maintain its accuracy even when the predicted code undergoes minor changes.To address these challenges,we propose an automated robust metric,called CodeScore-R,based on UniXcoder and contrastive learning,for evaluating the functionality of code synthesis.CodeScore-R employs techniques such as sketch-based processing,syntactic-equivalent transformations,and mutation testing to effectively mitigate the interference caused by identifiers,syntax structures,and operators on evaluation results.Experimental results demonstrate that in the tasks of code generation and migration in Java and Python,CodeScore-R outperforms other evaluation metrics and is more closely aligned with the Pass@k metric,while exhibiting stronger robustness.
作者 杨光 周宇 陈翔 张翔宇 Yang Guang;Zhou Yu;Chen Xiang;Zhang Xiangyu(College of Computer Science and Technology/College of Artificial Intelligence/College of Software,Nanjing University of Aeronautics and Astronautics,Nanjing 211106;School of Information Science and Technology,Nantong University,Nantong 226019)
出处 《计算机研究与发展》 EI CSCD 北大核心 2024年第2期291-306,共16页 Journal of Computer Research and Development
基金 国家自然科学基金项目(61972197,62372232) 江苏省研究生科研与实践创新计划项目(KYCX23_0396) 中央高校基本科研业务费专项资金资助(NG2023005)。
关键词 代码合成评估指标 功能准确性 鲁棒性 代码合成 神经网络 code synthesis evaluation metric functional correctness robustness code synthesis neural network
  • 相关文献

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部