摘要
最新的研究表明,从大量源代码中提取代码特征,建立统计语言模型,对代码有着良好的预测能力。然而,现有的统计语言模型在建模时,往往采用代码中的文本信息作为特征词,对代码的语法结构信息利用不充分,预测准确率仍有提升空间。为提高代码预测性能,提出了方法的约束关系这一概念;在此基础上,研究Java对象的方法调用序列,抽象代码特征,构建统计语言模型来完成代码预测,并研究基于方法约束关系的代码预测模型在Java语言中的适用范围。实验表明,该方法较现有的模型提高了8%的准确率。
The state-of-the-art study shows that extracting the code features from a large amount of source codes and building the statistical language model have good predictive ability for the codes.However,the present methods still can be polished in the predicting accuracy,because when they build the existing statistical language model,the text information in the codes is often used as feature words,which means that the syntax structure information of the codes can not be fully utilized.In order to improve the predicting performance of the code,this paper proposed the concept of the constraint relation of methods.Based on this,this paper studied the method invocation sequence of Java objects,abstracted code features,and built the statistical language model to complete the code prediction.Moreover,this paper studied the application scope of the prediction model based on the method constraint relationship in Java language.Experiments show that this method improves the accuracy by 8%compared with the existing model.
作者
方文渊
刘琰
朱玛
FANG Wen-yuan;LIU Yan;ZHU Ma(State Key Laboratory of Mathematical Engineering and Advanced Computing,Zhengzhou 450000,China)
出处
《计算机科学》
CSCD
北大核心
2019年第1期219-225,共7页
Computer Science
基金
国家重点研发计划基金(2017YFB0802900)资助
关键词
统计语言模型
方法的约束关系
代码预测
方法调用
Statistical language model
Method constraints
Code prediction
Method invocation