期刊文献+

程序代码相似度检测技术的研究与实现

Research and Implementation of Program Code Similarity Detection Technology
下载PDF
导出
摘要 针对传统相似度算法应用在程序设计课程作业检测中精度较低这一问题,通过研究最长公共子序列等算法,发现其优缺点,并提出了一种属性计数和结构度量技术相结合的程序相似度计算方法。该方法首先对源程序进行初步处理,将程序中的注释语句和空格删除,再确定常用元素及常用结构,然后利用Lex统计、抽取程序元素;利用开源代码ucc生成语法树,之后抽取相应的语法结构;最后生成特征向量,并计算代码相似度。实验结果表明该方法比最长公共子序列算法精度提高了10.6%。 Aiming at the problem that the traditional similarity algorithm is applied to the detection of programming errors in the program design course,the advantages and disadvantages of the longest common sub-sequence are studied,and a method based on attribute counting and structural measurement is proposed.Combined with the program similarity calculation method,the method first of the source program for the initial treatment,the program will be deleted in the annotation and space,re-determine the common elements and common structure,and then use Lex statistics,extract the program elements;use open source code ucc generated Grammar tree,and then extract the corresponding grammatical structure;finally generate the eigenvector,and calculate the code similarity.The experimental results show that the proposed method is 10.6%more accurate than the longest common sub-sequence algorithm.
作者 卫军超 耿楠 Wei Junchao;Geng Nan(College of Information Engineering,Northwest A&F University,Yangling Shaanxi 712100,China)
出处 《信息与电脑》 2017年第3期99-101,107,共4页 Information & Computer
基金 西安交通工程学院校级教改项目(项目编号:150006B)
关键词 属性计数法 结构度量技术 相似度度量 attribute counting method structure measurement technique similarity measure
  • 相关文献

参考文献6

二级参考文献49

  • 1雷海虹,缪力,张大方.面向对象程序的两种修改影响分析方法[J].计算机工程与科学,2005,27(5):101-103. 被引量:5
  • 2Ahmed K E,Panagiotis G I,Vassilios S V.Duplicate record detection:a survey[J].IEEE Transactions on Knowledge and Data Engieering,2007,19(1):1-15.
  • 3William E W.Overview of record linkage and current research directions[R].US Bureau of the Census,Stafistical Research Report Series RRS2006/02,2006.
  • 4William E W,Pradeep R,Stephen E.A comparison of string distance metrics for name-matching tasks[C].Acapulco,Mexico:Proceeding LICAI,2003:73-78.
  • 5Nick kSunita S,Divesh S.Record linkage:similarity measures and algorithms[C].Chicago,USA:Proceedings of the ACM SIGMOD International Conference on Management of Data.ACM Press,2006.
  • 6Cohen W,Ravikumar P,Feinberg S.A comparison of string mettics for matching names and records[C].New York:proceedings of KDD Workshop on Data Cleaning and Object Consolidation.ACM Press,2003:103-108.
  • 7Sheila T,Craig A K,Steven M.Learning domain-indepondent string transformation weights for high accuracy object identification[C].Edrnonton,Albcrta,Canada:proceedings of ACM SIGKDD.ACM Press,2002.
  • 8Mikhail B,Raymond J.Adaptive duplicate detection using learnable string similarity measures[C].Washington.DC:Procoedings of ACM SIGKDD,2003:39-48.
  • 9Joaehims T.SVMlight support vector machine[EB/OL].http://svmlight.joachims.org,2007.
  • 10Indrajit B,Lise G.Collective entity resolution in relational data[J].ACM Transaction on Knowledge Discovery from Data,2007(1):1-36.

共引文献33

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部