期刊文献+

基于LDA和DBSCAN的软件多版本克隆群映射方法 被引量:2

Clone group mapping method in multi-version based on LDA and DBSCAN
下载PDF
导出
摘要 针对克隆群映射大多基于相邻版本对比,当克隆群在中期版本短暂消失,实现多版本间映射存在困难,提出一种基于LDA和DBSCAN的软件多版本克隆群映射方法。首先,对所有版本的克隆群进行预处理,获得克隆群文档集合;其次,根据贝叶斯信息准则选取合适主题数T,进行主题概率模型训练,将所有克隆群都表示成T个主题的概率分布向量;再次,计算克隆群之间的JS距离,利用DBSCAN算法将同源的克隆群聚成一簇;最后,对同簇的克隆群按版本先后排序,得到多版本克隆群映射结果。对五款开源软件83个版本进行了映射实验,结果表明查全率、查准率均在98%以上,为克隆代码分析、管理提供了有力支持。 The present study on clone group mapping is mostly based on adjacent version comparison. When clone group dis- appear temporary in medium term version, it is difficult to implement mapping between multiple versions. This paper proposed a clone group mapping method based on the LDA and DBSCAN. First of all, it preprocessed clone group of all versions, and acquired collections of clone document. Secondly, it selected suitable subject number T based on the Bayesian information cri- terion, then it trained a theme probability model, and all clone group could be described as the vector of T themes probability distribution. Thirdly, it computed JS distance between clone group, used DBSCAN algorithm to put the homologous clone group into a cluster. Finally, it sorted clone group of the same cluster according to order of versions, and obtained clone map- ping results of multiple versions. Mapping experiment was conducted on 5 open-source softwares over 83 versions. Results show that the recall and precision is over 98%, which provides a strong support for analysis and management of clone code.
出处 《计算机应用研究》 CSCD 北大核心 2017年第2期481-486,共6页 Application Research of Computers
基金 国家自然科学基金资助项目(61363017 61462071) 内蒙古自然科学基金资助项目(2014MS0613 2015MS0606) 内蒙古自治区高等学校科学研究项目(NJZY16045)
关键词 克隆群映射 软件演化 LDA DBSCAN 克隆代码 clone group mapping software evolution LDA DBSCAN clone code
  • 相关文献

参考文献5

二级参考文献125

  • 1董志强,肖新光,张栗伟.编码心理学分析病毒同源性[J].信息安全与通信保密,2005(8):55-59. 被引量:9
  • 2Shaw M. Truth Vs. knowledge: The difference between what a component does and what we know it does//Proeeedings of the 8th International Workshop Software Specification and Design. Budapest, Hungary, 1996: 181- 185.
  • 3Binkley David. Source code analysis: A road map//Proceedings of the Future of Software Engineering. Minneapolis, MN, USA, 2007:104 -119.
  • 4Dwyer Matthew B, Hatcliff John, Robby, Pasareanu Corina S, Visser Willem. Formal software analysis emerging trends in software model cheeking//Proceedings of the Future of Software Engineering. Minneapolis, MN, USA, 2007: 120- 136.
  • 5Flemming Nielson, Hanne Riis Nielson, Chris Hankin. Principles of Program Analysis. Berlin, Germany: Springer Verlag, 2005.
  • 6Jackson Daniel, Rinard Martin. Software analysis: A roadmap//Proceedings of the Future of Software Engineering. Limerick, Ireland, 2000:133-145.
  • 7Aho Alfred V, Sethi Ravi, Ullman Jeffrey D. Compilers: Principles, Techniques, and Tools. New Jersey, USA: Addison-Wesley, 1986.
  • 8Clarke E M, Jr Grumberg O, Peled D A. Model Checking, Cambridge, MA: MIT Press, 2000.
  • 9Ball T, Rajamani S K. Automatically validating temporal safety properties of interfaces//Dwyer M B ed. Proceedings of the 8th SPIN Workshop. LNCS 2057. Springer, 2001:103-122.
  • 10Chen H, Wagner D A. MOPS: An infrastructure for examining security properties of software//Proceedings of the 9th ACM Conference on Computer and Communications Security. Washengton, DC, USA, 2002:235-244.

共引文献128

同被引文献9

引证文献2

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部