代码挖掘中的数据处理方法综述被引量：1

Survey on Data Processing in Code Mining

下载PDF

导出

摘要程序代码中蕴含着软件开发人员最原始的开发理念、设计思想和编程习惯等信息,将数据挖掘用于分析处理这种编码痕迹以便提取出潜藏着的有用知识是一个有着广阔前景的新的研究领域.由于当前的挖掘程序尚无法直接处理这种文本结构的程序代码,因而研究者需要将软件代码抽象成一种更有效的中间表达形式来作为挖掘对象.这种中间表达形式不仅界定了挖掘所使用的算法,更重要的是,它决定了所能挖掘的知识内容.对代码挖掘的一般过程进行了介绍并着重分析了代码挖掘的各种中间表达形式的特点.在此基础上,指出了当前代码挖掘存在的问题及今后的发展方向. Source code contains a wide range of information such as the original concepts, design ideas and programming habits of the developers. Analyzing coding traces in source code with data mining technology to extract those hidden but useful knowledge has been a new but promising research field. Since the current mining applications could not handle with source code of text structures, it is usually necessary to abstract software code into a more effective form of expression called intermediate form to simplify the mining process. Such expressing forms decide not only the mining algorithm used but also, more importantly, decide which kind of knowledge could be extracted from. This paper anatomized the general process of data mining based on source code and focused on the analysis of specialty of kinds of intermediate forms. On this foundation, it put forward the current problem in this area and point out the future direction of development.

作者谢素斌梁彬石文昌梁朝晖

机构地区中国人民大学信息学院数据工程与知识工程教育部实验室

出处《小型微型计算机系统》 CSCD 北大核心 2010年第11期2121-2128,共8页 Journal of Chinese Computer Systems

基金国家自然科学基金项目(6087321360703103)资助北京市自然科学基金项目(4082018)资助国家"八六三"高技术研究发展计划项目(2007AA01Z414)资助

关键词数据挖掘知识模式预处理漏洞检测 data mining knowledge pattern preprocess vulnerability detection

分类号 TP311 [自动化与计算机技术—计算机软件与理论]

引文网络
相关文献

参考文献2

1夏一民,罗军,张民选.基于静态分析的安全漏洞检测技术研究[J].计算机科学,2006,33(10):279-282. 被引量：29
2Jiawei Han,Jian Pei,Yiwen Yin,Runying Mao. Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach[J] 2004,Data Mining and Knowledge Discovery(1):53～87

二级参考文献22

1Mitchell J C.Programming language methods in computer security.ACM POPL,UK,2001
2Cousot P,Cousot R.Abstract interpretation:a unified lattice model for static analysis of programs by construction or approximation of fixpoints.ACM POPL,USA,1977
3Rice H G.Classes of Recursively Enumerable Sets and their Decision Problems.Transactions of the American Mathematical Society,1953(89):25～29
4Foster J S,Fahndrich M,Aiken A.A theory of type qualifiers.ACM PLDI,USA,1999
5Shankar U,Talwar K,Foster J S,et al.Detecting format string vulnerabilities with type qualifiers.USENIX Security Symposium,USA,2001
6Zhang Xiaolan,Edwards Antony,Jaeger T.Using CQUAL for static analysis of authorization hook.USENIX Security Symposium,USA,2002
7Johnson R,Wagner D.Finding user/kernel pointer bugs with type inference.USENIX Security Symposium,2004
8Aho A V,Sethi R,Ullman J D.Compilers principles,techniques and tools.编译原理.李建中,姜守旭译.北京:机械工业出版社,2003
9Larochelle D.Statically detecting likely buffer overflow vulnerabilities.USENIX Security Symposium,USA,2001
10Xie Yichen,Chou Andy,Engler D.ARCHER:Using Symbolic,Path-sensitive Analysis to Detect Memory Access Errors.ESEC/FSE'03,Helsinki,Finland,September 2003

共引文献28

1陈锦屏,邵斐,毕嫒媛.基于模型的软件安全静态检测技术[J].硅谷,2008,1(24):79-79. 被引量：1
2孙茜,宫云战,杨朝红.基于静态检测的程序安全漏洞测试[J].北京化工大学学报（自然科学版）,2007,34(A01):9-12.
3张林,曾庆凯.软件安全漏洞的静态检测技术[J].计算机工程,2008,34(12):157-159. 被引量：41
4孟策,贺也平,罗宇翔.C代码API一致性检验中的等值分析[J].软件学报,2008,19(10):2550-2561. 被引量：1
5安喜锋,李伟华,薛静.操作系统安全机制复合行为模型掘取技术研究[J].计算机应用研究,2009,26(1):314-316. 被引量：1
6杨洪路,宫云战,高文龄,白哥乐.软件安全静态检测技术与工具[J].信息化纵横,2009(9):70-72. 被引量：13
7郭曦,何炎祥,张焕国,胡颖,加米拉.沙塔尔.一种改进的指针安全分析算法[J].武汉大学学报（理学版）,2010,56(2):170-174. 被引量：4
8陈楠,王震宇,窦增杰,姚伟平,余弦.可执行可信软件安全性分析技术研究[J].计算机工程与设计,2010,31(22):4802-4805. 被引量：2
9宋明秋,王磊磊,于博.基于生命周期理论的安全漏洞时间风险研究[J].计算机工程,2011,37(1):131-133. 被引量：5
10王凯,孔祥营.软件静态分析工具评析[J].指挥控制与仿真,2011,33(2):109-111. 被引量：7

同被引文献15

1TAN Lin,ZHANG Xiaolan,MA Xiao,et al.AutoISES: Automatically Inferring Security Specifications and De- tecting Violations. Proceedings of the 17th Confer- ence on Security Symposium: July 28- August 1,2008 . 2008
2THUMMALAPENTA S,XIE T.NEGWeb: Detecting Neg- lected Conditions Via Mining Programming Rules from Open Source Code. Proceedings of the 2008 International Symposium on Software Testing and Analyis: July 20-24, 2008 . 2008
3NOVILLO D.Tree SSA- A New High-Level Optimization Framework for GCC. Proceedings of the 2003 USENIX Annual Technical Conference: June 9-14,2003 . 2003
4ENGLER D,CHEN DY,HALLEM S,et al.Bugs as De- viant Behavior: A General Approach to Inferring Errors in Systems Code. Proceedings of the 18th ACM Sympo- sium on Operating Systems Principles . 2001
5CWE/SANS Top 25 Most Dangerous Software Errors [OL]. http: / /cwe.mitre.org/top25/ . 2011
6LI Zhenmin,LU Shan,MYAGMAR S,et al.CP-Miner: A Tool for Finding Copy-paste and Related Bugs in Op- erating System Code. Proceedings of the 6th Confer- ence on Symposium on Operating Systems Design & Im- plementation: December,2004 . 2004
7LI Z,ZHOU Y.PR-Miner: Automatically Extracting Im- plicit Programming Rules and Detecting Violations in Large Software Code. Proceedings of the 10th Euro- pean Software Engineering Conference Held Jointly with 13th ACM SGSOFT Internaional Symposium on Founda- tions of Software Engineering: September 5-9,2005 . 2005
8Ottenstein K J,Ottenstein L M.The Program Dependence Graph in a Software Development Environment. Proceeding of the ACM SIGSOFT/SIGPLAN Software Engineering Symposium on Practical Software Development Environments . 1984
9Ferrante J,Ottenstein KJ,Warren JD.The program dependence graph and its use in optimization. ACM Transactions on Programming Languages and Systems . 1987
10Jiawei Han,Jian Pei,Yiwen Yin et al.Mining frequent patterns without candidate generation: A frequent-pattern tree approach. Data Mining and Knowledge Discovery . 2004

引证文献1

1梁彬,谢素斌,石文昌,梁朝晖,陈红.SVR-Miner:Mining Security Validation Rules and Detecting Violations in Large Software[J].China Communications,2011,8(4):84-98. 被引量：1

二级引证文献1

1Xue Jingfeng,Hu Changzhen,Guo Xiaojing,Leng Bingxing,Ma Rui.Memory Safety Based on Probabilistic Memory Allocation[J].China Communications,2012,9(4):115-122.

1楚燕婷,王丽琼.基于源代码挖掘的软件质量改进方法研究[J].电脑知识与技术,2009,5(12):9771-9772. 被引量：3
2李萌,赵俊峰,谢冰.基于主题建模和静态分析技术的软件代码功能性主题获取方法[J].中国科学：信息科学,2014,44(1):54-69. 被引量：5
3齐艳.基于改进神经网络的铝合金活塞杆关键数据挖掘研究与仿真[J].世界有色金属,2016,41(9S):38-39.
4胡敏,陈雨亭.基于距离挖掘的多变量原子性违例检测[J].计算机工程,2012,38(13):61-63. 被引量：2
5张骏,樊晓桠,刘松鹤.面向CMP体系结构的二级CACHE替换算法设计[J].小型微型计算机系统,2007,28(12):2277-2281.
6李礼,文梅,伍楠,李海燕,张春元.流处理器延迟隐藏机制的优化及实现[J].计算机工程与科学,2007,29(3):74-76.
7叶崴,马杰,侯朝焕.指令调度中推断和推测技术的研究[J].微计算机应用,2006,27(6):691-693.
8朱德利.Web结构挖掘的XML实现策略[J].计算机工程与设计,2006,27(23):4447-4449. 被引量：2
9董志.集成空间分析方法在线挖掘地理空间关联规则[J].电脑编程技巧与维护,2016(3):10-23. 被引量：4
10郭鑫,黄云,颜一鸣,周清平.一种新的频繁子树增量式更新方法[J].计算机应用,2010,30(5):1300-1303.

小型微型计算机系统

2010年第11期

浏览历史

内容加载中请稍等...

代码挖掘中的数据处理方法综述被引量：1

参考文献2

二级参考文献22

共引文献28

同被引文献15

引证文献1

二级引证文献1

相关作者

相关机构

相关主题

浏览历史

代码挖掘中的数据处理方法综述 被引量：1

参考文献2

二级参考文献22

共引文献28

同被引文献15

引证文献1

二级引证文献1

相关作者

相关机构

相关主题

浏览历史

代码挖掘中的数据处理方法综述被引量：1