一种基于节点间路径度量的图聚类算法被引量：5

A Graph Clustering Algorithm Based on Paths Between Nodes in Complex Networks

下载PDF

导出

摘要图聚类算法可以用于发现社会网络中的社区结构、蛋白质互作用网络中的功能模块等,是当前复杂网络研究的热点之一.对网络中节点的相似性和簇发现结果进行合理度量是核心问题.针对此问题,给出了一种基于节点间点不重复路径度量的节点相似性指标.以此为基础提出了一种面向复杂网络的基于“中心-扩展”策略的图聚类算法(A Graph Clustering Algorithm Based on Local Paths between Nodes in Complex Networks,PGC),包括节点相似性计算、中心节点选择、初始簇划分和簇优化四个主要过程.采用点不重复路径对节点相似性进行度量,消除了由大度节点引起较多的点重复路径对节点相似性的影响,提高了算法对大度节点邻域中节点的划分能力.通过与一些经典算法在11个真实网络、22个人工网络数据集上的实验比较分析,结果表明算法PGC在标准互信息、调整兰德系数、F度量、准确度等方面均表现出良好的性能. Many complex systems can be modeled as complex networks,such as social network,protein interaction network,citation network,metabolic network etc.Nodes in a complex network often can be grouped into different clusters,called communities.Nodes in the same group form specific functional modules through tight intra-connection,and nodes from different group have relatively loose inter-connection to ensure cooperation among the functional modules of the system.Detecting community structures is crucial to understand the topological structure and dynamic characteristics of networks.Based on analyzing connecting patterns within and between communities,researchers can discover the functional modules and their evolution processes in various complex systems.Many methods have been put forward to detect communities.Among these,core-extension-based methods show good performance in efficiency and effectiveness.There are two essential parts in core-extension algorithms:seed detection and community extension.Seed detection process locates seeds with high centrality.Then,communities can be built from the seeds based on node similarity metrics and proper quality function in community extension process.Node similarity metrics play important roles in community detection algorithms.Lots of methods have been proposed to measure similarity of nodes in complex networks.For example Jaccard Index based methods measure nodes’similarity based on their common direct neighbors.Katz Index based methods measure nodes’similarity based on the walks between two nodes.Comparing with Jaccard Index based methods,Katz Index takes advantage of general structure topology information.LS Index measures node’s similarity based on the local walks(lengths of walks are no larger than 3)between nodes,and can measure similarity between nodes by using their local connectivity information rather than their direct neighbors.LS Index simplifies the calculation and improve the efficiency,but it is still affected by other structure features such as node’s degree and clustering coefficient.For a node with relatively larger degree in a network,it might occur at higher frequencies in paths between two nodes in its direct neighborhood.The nodes in the direct neighborhood of a large-degree node tend to have higher similarities.As a result,LS index based methods tend to group the nodes in the neighborhood of a large-degree node into the same cluster.However,these nodes are often grouped into different clusters in practical networks.In this paper,we propose a graph clustering algorithm,called PGC.We define a novel node similarity index SLP based on vertex non-repetitive paths between nodes.The proposed SLP Index weakens the influence of large-degree nodes on the calculation of nodes’similarity,and can reflect the connectivity degree between two nodes in the network.First,the proposed PGC algorithm calculates nodes’SLP similarity,and determines node weights based on SLP.Second,PGC chooses the node with the highest weight as the first seed node,then selects other seed nodes by considering node weights as well as their similarities with the existing seeds.Then,PGC obtains initial partition by attaching each unseeded node to the seed with the highest SLP similarity with it.Finally,PGC optimizes the initial partition iteratively to maximize the cluster quality evaluation function which is based on complementary entropy.Experimental results show that SLP Index eliminates the influence on the nodes’similarity caused by vertex repetitive paths,and improves the algorithm’s ability to cluster the nodes in the neighborhood of large-degree nodes.Compared with other classical graph clustering algorithms on 11 real networks and 22 artificial networks,the proposed algorithm PGC shows a preferable performance.

作者郑文萍车晨浩钱宇华王杰杨贵 ZHENG Wen-Ping;CHE Chen-Hao;QIAN Yu-Hua;WANG Jie;YANG Gui(School of Computer&Information Technology,Shanxi University,Taiyuan 030006;Key Laboratory Computational Intelligence&Chinese Information Processing of Ministry of Education,Shanxi University,Taiyuan 030006;Research Institute of Big Data Science and Industry,Shanxi University,Taiyuan 030006)

机构地区山西大学计算机与信息技术学院山西大学计算智能与中文信息处理教育部重点实验室山西大学大数据科学与产业研究院

出处《计算机学报》 EI CSCD 北大核心 2020年第7期1312-1327,共16页 Chinese Journal of Computers

基金国家自然科学基金项目(61572005) 山西省自然科学基金(201801D121123) 山西省回国留学人员科研基金项目(2017-014)资助.

关键词复杂网络图聚类簇结构相似性度量连通性 complex network graph clustering cluster structure node similarity connectivity

分类号 TP301 [自动化与计算机技术—计算机系统结构]

引文网络
相关文献

参考文献3

1李慧嘉,李慧颖,李爱华.多尺度的社团结构稳定性分析[J].计算机学报,2015,38(2):301-312. 被引量：16
2杨贵,郑文萍,王文剑,张浩杰.一种加权稠密子图社区发现算法[J].软件学报,2017,28(11):3103-3114. 被引量：9
3王杰,梁吉业,郑文萍.一种面向蛋白质复合体检测的图聚类方法[J].计算机研究与发展,2015,52(8):1784-1793. 被引量：14

二级参考文献73

1Newman M E J. Fast algorithm for detecting community structure in networks. Physical Review E, 2004, 69 (6): 066133.
2Arenas A, Diaz-Guilera A, Perez-Vicente C. J. Synchroniza- tion reveals topological scales in complex networks. Physical Review Letters, 2006, 96(11): 114102.
3Pikovsky A, Rosenblum M, Kurths J. Synchronization: A Universal Concept in Nonlinear Sciences. Cambridge University Press, 2003.
4Meila M, Shi J. A random walks view of spectral segmentation //Proceedings of the 8th International Workshop on Artificial Intelligence and Statistics. San Francisco, USA, 2001: 166-169.
5Ronhovde P, Nussinov. Multiresolution community detection for megascale networks by information-based replica correla- tions. Physical Review E, 2009, 80(1):016109.
6Son S W, Jeong H, Nob J D. Random field Ising model and community structure in complex networks. The European Physical Journal B: Condensed Matter and Complex Systems, 2006, 50(3) : 431-437.
7Hughes B D. Random Walks and Random Environments. Oxford: Clarendon Press, 1996.
8Prem K G, David M B. Efficient discovery of overlapping communities in massive networks. Proceedings of the National Academy of Sciences, 2013, 110(36) : 14 534-14 539.
9Sun Peng-Gang, Gao Lin, Yang Yang. Maximizing modularity intensity for community partition and evolution. Information Sciences, 2013, 236(1): 83-92.
10Li Hui-Jia, Xu Bing-Ying, Zheng Liang, Yah Jia. Integrating attributes of nodes solves the community structure partition effectively. Modern Physics Letters B, 2014, 28(5) : 1450037.

共引文献34

1黄蓝会.基于社会媒体网络的聚类方法的研究[J].微型电脑应用,2016,32(6):1-2. 被引量：4
2胡嘉伟,吴云志,乐毅,张友华.基于改进LF算法的PPI网络聚类方法[J].湖南工程学院学报（自然科学版）,2016,26(3):56-59. 被引量：1
3李慧嘉,李爱华,李慧颖.社团结构迭代快速探测算法[J].计算机学报,2017,40(4):970-984. 被引量：14
4李超,赵书良,赵骏鹏,高琳,池云仙.多尺度关联规则尺度上推算法[J].计算机科学,2017,44(8):285-289. 被引量：4
5李磊,汪萌,吴信东.基于社交网络的社交行为分析[J].电子与信息学报,2017,39(9):2108-2118. 被引量：3
6张远鹏,邓赵红,钟富礼,杭文龙,王士同.基于代表点评分策略的快速自适应聚类算法[J].计算机研究与发展,2018,55(1):163-178. 被引量：5
7杨贵,郑文萍,王文剑,张浩杰.一种加权稠密子图社区发现算法[J].软件学报,2017,28(11):3103-3114. 被引量：9
8郑文萍,曲瑞,穆俊芳.具有社区结构的无标度网络生成算法[J].计算机科学,2018,45(2):76-83. 被引量：3
9刘翠翠,孙伟.基于加权网络和局部适应度的蛋白质复合物识别算法[J].计算机应用研究,2018,35(8):2308-2310. 被引量：2
10宋砚秋,李桂君,李慧嘉.基于势能背景信息的社团标签探测算法[J].计算机科学,2018,45(B06):314-317.

同被引文献40

1高轩瑞,周晓萌.基于动态Dijkstra算法的道路通行能力问题的研究[J].产业科技创新,2020,2(1):72-74. 被引量：1
2熊维茜,高平,呙维,吉福龙,朱欣焰.面向多层建筑的室内外一体化路径规划算法[J].测绘地理信息,2020,45(1):44-46. 被引量：5
3宋婷,刘文予,刘俊涛.基于骨架树的线性骨架拓扑相似性度量算法[J].红外与激光工程,2005,34(1):74-79. 被引量：4
4熊虎岗,程浩忠,李宏仲.基于免疫算法的多目标无功优化[J].中国电机工程学报,2006,26(11):102-108. 被引量：86
5贺玲,吴玲达,蔡益朝.数据挖掘中的聚类算法综述[J].计算机应用研究,2007,24(1):10-13. 被引量：225
6邓红艳,武芳,王辉连,朱鲲鹏.基于拓扑相似性的道路网综合模型[J].测绘科学技术学报,2008,25(3):183-187. 被引量：18
7刘大有,金弟,何东晓,黄晶,杨建宁,杨博.复杂网络社区挖掘综述[J].计算机研究与发展,2013,50(10):2140-2154. 被引量：72
8许悦,余涛.网络拓扑结构与供电可靠性的数学关系分析[J].电力系统自动化,2019,43(2):168-175. 被引量：24
9王杰,梁吉业,郑文萍.一种面向蛋白质复合体检测的图聚类方法[J].计算机研究与发展,2015,52(8):1784-1793. 被引量：14
10张文佳,尚伟伟.2自由度绳索牵引并联机器人的高速点到点轨迹规划方法[J].机械工程学报,2016,52(3):1-8. 被引量：18

引证文献5

1郑文萍,岳香豆,杨贵.基于随机游走的改进标签传播算法[J].计算机应用,2020,40(12):3423-3429. 被引量：4
2董文娜,王增平,赵乔,李钰洋,辛忠良,陈玉蛟.基于人工免疫聚类算法的配电网故障状态相似性分析方法[J].电力系统及其自动化学报,2021,33(6):60-66. 被引量：7
3封亚凯,杨大利,侯凌燕,梁旭,佟强.多自由度机械臂路径优化研究[J].北京信息科技大学学报（自然科学版）,2022,37(2):30-37. 被引量：1
4黎志生,李涛,陈瑛.室内外立体空间一体化导航的研究和实现[J].现代电子技术,2022,45(16):145-149.
5徐新黎,尹晶,肖云月,龙海霞.重要特征选择和局部网络拓扑嵌入的社区发现算法[J].小型微型计算机系统,2023,44(5):939-946.

二级引证文献12

1黄冬梅,葛书阳,胡安铎,孙锦中,时帅,孙园.采用中心优化和双尺度相似性度量的改进K-means负荷聚类方法[J].电力系统及其自动化学报,2021,33(12):93-100. 被引量：12
2乔新东,刘林林.基于改进模糊c均值聚类中心优化算法的负荷分类方法[J].工业控制计算机,2022,35(1):106-108.
3刘井莲,于丽萍,吴亚明,李显凯,赵卫绩.基于共同邻居相似度的改进标签传播算法[J].通化师范学院学报,2022,43(6):60-65.
4杨喜行,黄纯,申亚涛,胡念恩,万子恒.基于损失功率匹配的配电线路故障定位方法[J].中国电力,2022,55(8):113-120. 被引量：10
5钱浩,罗少杰,郭强,董志会,邓敏.基于5G通信的有源配电网多方向故障恢复策略[J].供用电,2022,39(12):1-10. 被引量：5
6杨锋,张旭东,焦彦华,李上群,童胜昌.基于遗传算法的业务标签优先级排序系统[J].电子设计工程,2023,31(11):36-40.
7魏丽君,吴海波,刘海龙.基于改进梯度投影算法的十自由度移动机械臂轨迹规划研究[J].自动化与仪器仪表,2023(5):249-252. 被引量：1
8栗子豪,朱齐,王沁,张麟,沈健,陈志樑.主变压器低压侧母线失电事故预案快速生成[J].电气自动化,2024,46(1):60-62.
9舒东胜,杨洁,赵红生,李亚馨.配电网拓扑结构相似度分析及自动规划算法研究[J].自动化与仪器仪表,2023(11):145-149. 被引量：2
10韩永印,王侠,王志晓.基于节点影响值的社区网络稳定标签传播算法[J].沈阳工业大学学报,2024,46(2):184-190.

1杨旭,钱晓东.基于改进的Vicsek模型的社会网络同步聚类算法[J].数据分析与知识发现,2020,4(4):119-128. 被引量：1
2朱秋羽.建筑工程建设标准化管理要点研究[J].幸福生活指南,2019,0(45):0253-0253.
3刘一鸣,袁方超,王孟皓.大鼠肝脏再生终止阶段差异基因表达的生物信息学分析[J].检验医学与临床,2020,17(7):921-925. 被引量：1
4洪银燕.浅析英汉翻译中扩展策略的运用——以《傲慢与偏见》汉译本为例[J].人文之友,2019,0(24):74-75.
5刘树新,李星,陈鸿昶,王凯.基于资源传输匹配度的复杂网络链路预测方法[J].通信学报,2020,41(6):70-79. 被引量：20
6金辉.汽车车身涂装工艺及其质量控制分析[J].汽车世界,2020(9):14-14.
7陈丽敏,张岩,杨柳.半监督元路径的异构信息网络社区发现算法[J].小型微型计算机系统,2020,41(6):1152-1155. 被引量：3
8宁阳,武志峰,宁晴.面向有向网络关键节点识别算法研究[J].天津职业技术师范大学学报,2020,30(2):35-40. 被引量：2
9程艳涛,刘大川,罗清元.基于层次分析法的河南省流域水文相似性研究[J].写真地理,2020,0(9):0292-0293.
10何英静,李继红,但扬清,朱艳伟,张笑弟,喻哲扬,徐政.柔性直流输电系统对多直流馈入系统运行性能的改善作用研究[J].电力电容器与无功补偿,2020,41(3):106-111. 被引量：20

计算机学报

2020年第7期

浏览历史

内容加载中请稍等...

一种基于节点间路径度量的图聚类算法被引量：5

参考文献3

二级参考文献73

共引文献34

同被引文献40

引证文献5

二级引证文献12

相关作者

相关机构

相关主题

浏览历史

一种基于节点间路径度量的图聚类算法 被引量：5

参考文献3

二级参考文献73

共引文献34

同被引文献40

引证文献5

二级引证文献12

相关作者

相关机构

相关主题

浏览历史

一种基于节点间路径度量的图聚类算法被引量：5