摘要
针对传统相似度算法应用在程序设计课程作业检测中精度较低这一问题,通过研究最长公共子序列等算法,发现其优缺点,并提出了一种属性计数和结构度量技术相结合的程序相似度计算方法。该方法首先对源程序进行初步处理,将程序中的注释语句和空格删除,再确定常用元素及常用结构,然后利用Lex统计、抽取程序元素;利用开源代码ucc生成语法树,之后抽取相应的语法结构;最后生成特征向量,并计算代码相似度。实验结果表明该方法比最长公共子序列算法精度提高了10.6%。
Aiming at the problem that the traditional similarity algorithm is applied to the detection of programming errors in the program design course,the advantages and disadvantages of the longest common sub-sequence are studied,and a method based on attribute counting and structural measurement is proposed.Combined with the program similarity calculation method,the method first of the source program for the initial treatment,the program will be deleted in the annotation and space,re-determine the common elements and common structure,and then use Lex statistics,extract the program elements;use open source code ucc generated Grammar tree,and then extract the corresponding grammatical structure;finally generate the eigenvector,and calculate the code similarity.The experimental results show that the proposed method is 10.6%more accurate than the longest common sub-sequence algorithm.
作者
卫军超
耿楠
Wei Junchao;Geng Nan(College of Information Engineering,Northwest A&F University,Yangling Shaanxi 712100,China)
出处
《信息与电脑》
2017年第3期99-101,107,共4页
Information & Computer
基金
西安交通工程学院校级教改项目(项目编号:150006B)
关键词
属性计数法
结构度量技术
相似度度量
attribute counting method
structure measurement technique
similarity measure