摘要
网络聚类广泛应用于现实世界的各个领域,受到了越来越多的关注.由于保留了节点和链接关系的异质性,异质信息网络聚类相较于同质网络聚类具有更优的性能.然而,现有基于图神经网络的异质信息网络聚类忽略了节点属性以及拓扑结构对聚类的权重不同的问题.此外,这些方法仅对单一类型的目标节点聚类,而没有考虑其余类型节点的辅助作用.为此,提出了面向异质信息网络的双通道协同聚类算法(B3C),其能够有效地融合节点属性和拓扑结构,并挖掘异质节点间的潜在相关性,从而提高聚类性能.首先,设计了一个简单有效的双通道编码器以聚合拓扑结构及相似矩阵的邻域信息;接着,应用自训练聚类的同时学习异质信息网络表示以及优化聚类分配,并采用协同聚类机制,以对不同类型节点同时聚类;最后,利用三元中心损失(Triplet-Center Loss)学习具有区分度的节点表示,以凝聚相似节点,分离不相似节点.在公开数据集上进行了大量实验,验证了本文提出的双通道编码器性能相较于广泛使用的图神经网络编码器有显著提升,并且B3C精度优于现有的基于学习的异质信息网络聚类方法.
Network clustering has received increasing attention for its ubiquitous real-world applications.Heterogeneous information network(HIN)clustering improves traditional homogeneous network clustering,as HIN reserves heterogeneity of nodes and relations to enhance clustering.However,existing HIN cluster-ing studies based on graph neural networks(GNNs)ignore different weights of node features and topology structures on clustering.Moreover,these methods only cluster target nodes of a single type,while do not consider the auxiliary of nodes of other types in HINs,which significantly degrades their performance.To this end,we propose a bi-channel co-clustering algorithm for heterogeneous information networks,abbreviated B3C,which is capable of merging node features and topology structures,as well as capturing the hidden correlations between heterogeneous nodes,in order to achieve effective HIN clustering.Specifically,we first design a simple yet effective bi-channel encoder to aggregate neighborhood information w.r.t.to-pology structure and a similarity matrix.Then,self-training based clustering is performed to jointly opti-mize the cluster assignments while learning HIN representations.Next,the co-clustering mechanism is used to cluster nodes of different types simultaneously.Finally,we adopt the triplet-center loss to obtain dis-criminative node embeddings,so that similar nodes are condensed and dissimilar nodes are separated.Ex-tensive experiments on public datasets demonstrate that the designed bi-channel encoder shows significant improvements over widely used GNN encoder and B3C outperforms the state-of-the-art learning-based HIN clustering competitors.
作者
邱林山
房子荃
陈璐
张天明
李天义
QIU Lin-Shan;FANG Zi-Quan;CHEN Lu;ZHANG Tian-Ming;LI Tian-Yi(College of Computer Science and Technology,Zhejiang University,Hangzhou 310027;College of Software Technology,Zhejiang University,Ningbo,Zhejiang 315048;College of Computer Science and Technology,Zhejiang University of Technology,Hangzhou 310023;Department of Computer Science,Aalborg University,Aalborg 9220 Denmark)
出处
《计算机学报》
EI
CAS
CSCD
北大核心
2023年第11期2416-2430,共15页
Chinese Journal of Computers
基金
国家自然科学基金青年项目(No.62102351)
浙江省自然科学基金探索青年项目(No.LQ22F020018)资助。
关键词
异质信息网络
网络聚类
协同聚类
网络表示学习
图神经网络
heterogeneous information network
network clustering
co-clustering
network represen-tation learning
graph neural network