GKCI:改进的基于图神经网络的关键类识别方法

GKCI:An Improved GNN-based Key Class Identification Method

下载PDF

导出

摘要研究人员将软件系统中的关键类作为理解和维护一个系统的起点,而关键类上的缺陷给系统带来了极大的安全隐患.因此,识别关键类可提高软件的可靠性和稳定性.常用的识别方法是将软件系统抽象为一个类依赖网络,再根据定义好的度量指标和计算规则计算每个节点的重要性得分,如此基于非训练框架得到的关键类,并没有充分利用软件网络的结构信息.针对这一问题,基于图神经网络技术提出了一种有监督的关键类识别方法.首先,将软件系统抽象为类粒度的软件网络,并利用网络嵌入学习方法Node2Vec得到类节点的表征向量,再通过一个全连接层将节点的表征向量转换为具体分值;然后,利用改进的图神经网络模型,综合考虑类节点之间的依赖方向和权重,进行节点分值的聚合操作;最后,模型输出每个类节点的最终得分并进行降序排列,从而实现关键类的识别.在8个Java开源软件系统上,通过与基准方法的实验对比,验证了该方法的有效性.实验结果表明:在前10个候选关键类中,所提方法比最先进的方法提升了6.4%的召回率和3.5%的精确率. Researchers use key classes as starting points for software understanding and maintenance.These key classes may cause a significant security risk to the software if they have defects.Therefore,identifying key classes can improve the reliability and stability of the software.Most of the existing methods are based on non-trainable solutions,which calculate the score of each node according to a certain calculation rule,and cannot fully utilize the structural information available in the software network.To solve these problems,a supervised deep learning method is proposed based on graph neural network technology.First,the project is built as a software network and the network embedding learning method Node2Vec is used to learn the node representation.Then,the node representation is mapped into a score through a simple dense network.Second,the aggregation function of the graph neural networks(GNNs)is improved to aggregate important scores instead of node embedding.The direction and weight information between nodes are also considered when aggregating the scores of neighbor nodes.Finally,the nodes are ranked in descending order according to the predicted score output by the model.To evaluate the effectiveness of the proposed method,it is applied to eight Java open-source software systems.The experimental results show that the proposed method performs better than benchmark methods.In the top 10 key candidates,the proposed method achieves 6.4%higher recall and 3.5%higher precision than the state-of-the-art.

作者周纯英曾诚何鹏张龑 ZHOU Chun-Ying;ZENG Cheng;HE Peng;ZHANG Yan(School of Computer Science and Information Engineering,Hubei University,Wuhan 430062,China;School of Cyber Science and Technology,Hubei University,Wuhan 430062,China;Engineering Technology Research Center for Education Informatization of Hubei Province,Wuhan 430062,China)

机构地区湖北大学计算机与信息工程学院湖北大学网络空间安全学院湖北省教育信息化工程技术研究中心

出处《软件学报》 EI CSCD 北大核心 2023年第6期2509-2525,共17页 Journal of Software

基金国家自然科学基金(62102136) 湖北省重点研发计划(2021BAA184,2021BAA188,2022BAA044) 湖北省技术创新专项(2020AEA008)。

关键词关键类识别软件网络图神经网络软件度量 key class identification software network graph neural network(GNN) software measurement

分类号 TP311 [自动化与计算机技术—计算机软件与理论]

引文网络
相关文献

参考文献4

1潘伟丰,李兵,马于涛,姜波.基于加权PageRank算法的关键包识别方法[J].电子学报,2014,42(11):2174-2183. 被引量：8
2郭嘉琰,李荣华,张岩,王国仁.基于图神经网络的动态网络异常检测算法[J].软件学报,2020,31(3):748-762. 被引量：20
3张健雄,宋坤,何鹏,李兵.基于图神经网络的软件系统中关键类的识别[J].计算机科学,2021,48(12):149-158. 被引量：2
4何鹏,王鹏,李兵,胡思文.基于多粒度软件网络模型的软件系统演化分析[J].电子学报,2018,46(2):257-267. 被引量：10

二级参考文献44

1谭跃进,吴俊,邓宏钟.复杂网络中节点重要度评估的节点收缩方法[J].系统工程理论与实践,2006,26(11):79-83. 被引量：257
2李兵,王浩,李增扬,何克清,余敦辉.基于复杂网络的软件复杂性度量研究[J].电子学报,2006,34(B12):2371-2375. 被引量：38
3汪小帆,李翔,陈关荣.网络科学导论[M].北京:高等教育出版社,2012.
4Yau S S, Collofeflo J S. Some stability measures for software maintenance[ J ]. IEEE Transactions on Software Engineering, 1980, SE- 6(6) :545 - 552.
5Guimaraes T. Managing application program maintenance ex- penditttre [ J]. Communication of ACM, 1983,26 (10) : 739 - 746.
6Corbi T A.Program understanding:Challenge for the 90s [ J].IBM Systems Journal, 1990,28(2) :294 - 306.
7Zaidman A, Demeyer S. Automatic identification ofkey classes in a software system using Web mining techniques[ J]. Journal of Software Maintenance and Evolution: Research and Prac- tices, 2008,20(6) :387 - 417.
8Ko A J,Myer B A,Coblenz M J,et al.An exploratory study of how developers seek, relate, and collect relevant information during software maintenance tasks [J]. 1F.EE Transaction on Software Engineering, 2006,32(12) : 971 - 987.
9Potanin A,Noble J,Frean M, et al. Scale-free geometry in ob- jecoriented programs [J ]. Communications of the ACM, 2005,48(5) :99 - 103.
10Albert R, Jeong H, BarabOsi A L. Error and attack tolerance in complex networks [ J ]. Nature, 2000,406(6794 ) : 378 - 382.

共引文献36

1金泳,高扬华,潘晓华,沈诗婧,朱心洲.采用数据血缘的数据热度预测方法[J].计算机应用,2023,43(S01):119-125. 被引量：1
2刘琰,钟凤喆,冯昊,王黎明,范加兴.基于边权表示学习的乌克兰互联网变化感知[J].智能安全,2023,2(1):62-74.
3张禹,周翔.结合PageRank算法的Lucene评分机制改进研究[J].三明学院学报,2015,32(4):54-59.
4王锐,何聚厚.基于领域本体学习资源库自动构建模型研究[J].电子设计工程,2015,23(24):32-35. 被引量：10
5陈苏婷,赵启正,张艳艳.基于PPR的Android恶意软件检测方法[J].计算机工程与设计,2016,37(9):2342-2346. 被引量：1
6胡思文,李兵,何鹏,赵玉琦,刘海洋.一种基于h指数的软件网络中重要类的度量方法[J].小型微型计算机系统,2017,38(2):249-253. 被引量：6
7潘伟丰,姜波,李兵,胡博,宋贝贝.基于组合历史的交互式服务推荐方法[J].计算机研究与发展,2018,55(3):613-628. 被引量：6
8秦怀斌,郑瑶,郭理.基于复杂网络理论的软件体系结构描述与质量评估[J].石河子大学学报（自然科学版）,2019,37(2):259-264. 被引量：2
9王晨旭,余敦辉,张万山,张兴盛.基于核心度排序的软件众包模块分配算法[J].计算机工程,2019,45(7):66-70. 被引量：2
10王晨旭,王晓晨,余敦辉,吴珊.基于动态解耦的软件众包任务分解算法[J].计算机工程,2019,45(8):120-124. 被引量：1

1包银敏.广电网络智能运维平台的设计与应用[J].中国有线电视,2023(5):20-24. 被引量：1
2林赣秀.软件度量分析工具的研究与实现[J].计算机应用文摘,2023,39(13):119-121.
3王超,赵鑫业,刘银山.一种作战软件可维护性综合评估方法[J].现代电子技术,2023,46(14):75-79.
4顾守珂,陈文.基于增强AST的图神经网络函数级代码漏洞检测方法[J].计算机科学,2023,50(6):283-290.
5赵立阳,常天庆,褚凯轩,郭理彬,张雷.完全合作类多智能体深度强化学习综述[J].计算机工程与应用,2023,59(12):14-27. 被引量：1
6李晨蔚,张恒巍,高伟,杨博.基于AdaN自适应梯度优化的图像对抗迁移攻击方法[J].信息网络安全,2023(7):64-73.
7刘浩冉,王明浩,杜晓宇,韩中杰,杨子豪.网络时代大学生劳动教育现状社会实践调研[J].中文科技期刊数据库（全文版）教育科学,2023(6):0155-0157.
8唐焕玲,宋双梅,刘孝炎,窦全胜,鲁明羽.基于u-wordMixup的半监督深度学习模型[J].控制与决策,2023,38(6):1646-1652. 被引量：1
9袁晓玲.全周期管理破解“卡脖子”难题浙江聚力打好关键核心技术攻坚战[J].今日科技,2023(7):67-68. 被引量：1
10陈昆,周靓,陈梦瑶.数字金融支持实体经济高质量发展:理论机制与经验证据[J].四川轻化工大学学报（社会科学版）,2023,38(4):48-60. 被引量：2

软件学报

2023年第6期

浏览历史

内容加载中请稍等...

GKCI:改进的基于图神经网络的关键类识别方法

参考文献4

二级参考文献44

共引文献36

相关作者

相关机构

相关主题

浏览历史