摘要
传统的重叠社区发现基于网络的结构信息,具体依靠节点之间的连接关系,由于没有使用节点的内容信息,难以反映网络社区的语义。文中提出了一种大规模网络中基于节点属性的重叠社区发现算法(Overlapping Community Detection algorithm based on LDA,OCD_LDA),该算法使用LDA主题模型对节点内容进行多维属性建模,将网络节点看作文章,节点所携带的多维属性值看作文章中的单词,因此网络中的社区对应了主题模型中的主题,节点的多重社区归属对应于文章的多个主题。算法进一步考虑到网络中节点内容短小在主题建模过程中导致的数据稀疏问题,在LDA主题模型中引入Spike and Slab prior方法辅助实现变量选择和参数估计,有效地解决节点上社区分布的稀疏性和平滑性问题。实验使用DBLP文献数据集对算法进行了验证,结果表明,OCD_LDA算法能够更加有效地发现大规模网络中的重叠社区分布,揭示出复杂数据的内在特性。
The traditional overlapping community detection is based on the network structure information, and depends on the connection relationship between the nodes. Without the content information of the nodes, it is difficult to reveal the semantics of the network community. An overlapping community detection algorithm based on node attributes in largescale networks, overlapping community detection algorithm based on LDA(OCD_LDA) ,is proposed. The LDA topic model is used to model the multi-dimensional attributes of the node content in the algorithm, while a network node is regarded as an article and the multi-dimensional attribute value carried by the node is regarded as the words in the article. Therefore, the community in the network corresponds to the theme in the topic model, and the multiple community attri- bution of nodes corresponds to multiple themes of the article. Moreover, the data sparsity caused by short content of the nodes in the topic modeling process is considered, and then the Spike and Slab prior method is introduced in the LDA topic model to help implement variable selection and parameter estimation to solve the sparsity and smoothness issues of community distribution on nodes. The experimental result in the DBLP bibliographic data set shows that the, OCD_LDA can more effectively detect the distribution of overlapping communities in large-scale networks and reveal the intrinsic properties of complex data.
作者
张伟
祁德昊
陈云芳
ZHANG Wei;QI Dehao;CHEN Yunfang(School of Computer Science, Nanjing University of Posts and Telecommunications, Nanjing 210023, China)
出处
《南京邮电大学学报(自然科学版)》
北大核心
2018年第3期54-64,共11页
Journal of Nanjing University of Posts and Telecommunications:Natural Science Edition
基金
国家自然科学基金(61272422
61672297)资助项目
关键词
社会网络
LDA
社区发现
重叠社区
social networks
LDA
community detection
overlapping communities