摘要
图聚类算法可以用于发现社会网络中的社区结构、蛋白质互作用网络中的功能模块等,是当前复杂网络研究的热点之一.对网络中节点的相似性和簇发现结果进行合理度量是核心问题.针对此问题,给出了一种基于节点间点不重复路径度量的节点相似性指标.以此为基础提出了一种面向复杂网络的基于“中心-扩展”策略的图聚类算法(A Graph Clustering Algorithm Based on Local Paths between Nodes in Complex Networks,PGC),包括节点相似性计算、中心节点选择、初始簇划分和簇优化四个主要过程.采用点不重复路径对节点相似性进行度量,消除了由大度节点引起较多的点重复路径对节点相似性的影响,提高了算法对大度节点邻域中节点的划分能力.通过与一些经典算法在11个真实网络、22个人工网络数据集上的实验比较分析,结果表明算法PGC在标准互信息、调整兰德系数、F度量、准确度等方面均表现出良好的性能.
Many complex systems can be modeled as complex networks,such as social network,protein interaction network,citation network,metabolic network etc.Nodes in a complex network often can be grouped into different clusters,called communities.Nodes in the same group form specific functional modules through tight intra-connection,and nodes from different group have relatively loose inter-connection to ensure cooperation among the functional modules of the system.Detecting community structures is crucial to understand the topological structure and dynamic characteristics of networks.Based on analyzing connecting patterns within and between communities,researchers can discover the functional modules and their evolution processes in various complex systems.Many methods have been put forward to detect communities.Among these,core-extension-based methods show good performance in efficiency and effectiveness.There are two essential parts in core-extension algorithms:seed detection and community extension.Seed detection process locates seeds with high centrality.Then,communities can be built from the seeds based on node similarity metrics and proper quality function in community extension process.Node similarity metrics play important roles in community detection algorithms.Lots of methods have been proposed to measure similarity of nodes in complex networks.For example Jaccard Index based methods measure nodes’similarity based on their common direct neighbors.Katz Index based methods measure nodes’similarity based on the walks between two nodes.Comparing with Jaccard Index based methods,Katz Index takes advantage of general structure topology information.LS Index measures node’s similarity based on the local walks(lengths of walks are no larger than 3)between nodes,and can measure similarity between nodes by using their local connectivity information rather than their direct neighbors.LS Index simplifies the calculation and improve the efficiency,but it is still affected by other structure features such as node’s degree and clustering coefficient.For a node with relatively larger degree in a network,it might occur at higher frequencies in paths between two nodes in its direct neighborhood.The nodes in the direct neighborhood of a large-degree node tend to have higher similarities.As a result,LS index based methods tend to group the nodes in the neighborhood of a large-degree node into the same cluster.However,these nodes are often grouped into different clusters in practical networks.In this paper,we propose a graph clustering algorithm,called PGC.We define a novel node similarity index SLP based on vertex non-repetitive paths between nodes.The proposed SLP Index weakens the influence of large-degree nodes on the calculation of nodes’similarity,and can reflect the connectivity degree between two nodes in the network.First,the proposed PGC algorithm calculates nodes’SLP similarity,and determines node weights based on SLP.Second,PGC chooses the node with the highest weight as the first seed node,then selects other seed nodes by considering node weights as well as their similarities with the existing seeds.Then,PGC obtains initial partition by attaching each unseeded node to the seed with the highest SLP similarity with it.Finally,PGC optimizes the initial partition iteratively to maximize the cluster quality evaluation function which is based on complementary entropy.Experimental results show that SLP Index eliminates the influence on the nodes’similarity caused by vertex repetitive paths,and improves the algorithm’s ability to cluster the nodes in the neighborhood of large-degree nodes.Compared with other classical graph clustering algorithms on 11 real networks and 22 artificial networks,the proposed algorithm PGC shows a preferable performance.
作者
郑文萍
车晨浩
钱宇华
王杰
杨贵
ZHENG Wen-Ping;CHE Chen-Hao;QIAN Yu-Hua;WANG Jie;YANG Gui(School of Computer&Information Technology,Shanxi University,Taiyuan 030006;Key Laboratory Computational Intelligence&Chinese Information Processing of Ministry of Education,Shanxi University,Taiyuan 030006;Research Institute of Big Data Science and Industry,Shanxi University,Taiyuan 030006)
出处
《计算机学报》
EI
CSCD
北大核心
2020年第7期1312-1327,共16页
Chinese Journal of Computers
基金
国家自然科学基金项目(61572005)
山西省自然科学基金(201801D121123)
山西省回国留学人员科研基金项目(2017-014)资助.
关键词
复杂网络
图聚类
簇结构
相似性度量
连通性
complex network
graph clustering
cluster structure
node similarity
connectivity