摘要
源代码迁移技术旨在将源代码从一种编程语言转换至另一种编程语言,以减轻开发人员迁移软件项目的负担.现有研究通常利用神经机器翻译(NMT)模型将源代码转换为目标代码,但这些研究忽略了代码结构特征,导致源代码迁移性能不佳.为此,本文提出了基于代码语句掩码注意力机制的源代码迁移模型CSMAT (code-statement masked attention Transformer).该模型利用Transformer的掩码注意力机制(masked attention mechanism),在编码时引导模型理解源代码语句的语法和语义以及语句间上下文特征,在译码时引导模型关注并对齐源代码语句,从而提升源代码迁移性能.本文使用真实项目数据集CodeTrans进行实证研究,并使用4个指标评估模型性能.实验结果验证了CSMAT的有效性,同时验证了代码语句掩码注意力机制在预训练模型的适用性.
Source code migration techniques are designed to convert source code from one programming language to another,which helps reduce developers’burden in migrating software projects.Existing studies mainly use neural machine translation(NMT)models to convert source code to target code.However,these studies ignore the code structure features,resulting in poor source code migration performance.Therefore,this study proposes a source code migration model based on a code-statement masked attention Transformer(CSMAT).The model uses Transformer’s masked attention mechanism to guide the model to understand the syntax and semantics of source code statements and interstatement contextual features when encoding and make the model focus on and align the source code statements when decoding,so as to improve migration performance of source code.Empirical studies are conducted on the real project dataset,namely CodeTrans,and model performance is evaluated by using four metrics.The experimental results have validated the effectiveness of CSMAT and the applicability of the code-statement masked attention mechanism to pretrained models.
作者
徐明瑞
李征
刘勇
吴永豪
XU Ming-Rui;LI Zheng;LIU Yong;WU Yong-Hao(College of Information Science and Technology,Beijing University of Chemical Technology,Beijing 100029,China)
出处
《计算机系统应用》
2023年第9期77-88,共12页
Computer Systems & Applications
基金
国家自然科学基金(61902015,61872026)。
关键词
代码语句
掩码
代码迁移
机器翻译
注意力机制
code statement
mask
code migration
machine translation
attention mechanism