期刊文献+

多维图结构聚类的社交关系挖掘算法 被引量:7

Social Relationship Mining Algorithm by Multi-Dimensional Graph Structural Clustering
下载PDF
导出
摘要 社交关系的数据挖掘一直是大图数据研究领域中的热门问题.图聚类算法如SCAN(structural clustering algorithm for network)虽然可以迅速地从海量图数据中获得关系紧密的社区结构,但这类社区往往只表示了社交对象的聚集,无法反馈对象间的真实社交关系,如家庭成员、同事、同学等.要获取对象间真实的社交关系,需要更多维度地挖掘现实中社交对象间复杂的交互关系.对象间的交互维度很多,例如通话、见面、微信、电子邮件等,而传统SCAN等聚类算法仅能够挖掘单维度的交互数据.在研究社交对象间的多维社交关系图数据与传统图结构聚类算法的基础上,提出了一种有效的子空间聚类算法SCA(subspace cluster algorithm),对多维度下子空间的图结构聚类进行研究,目的是探索如何通过图数据挖掘发现对象间真实的社交关系.SCA算法遵循自底向上的原则,能够发现社交图数据中所有子空间的聚类集.为提升SCA的运行速度,利用其子空间聚类的单调性进行了性能优化,进而提出了剪枝算法SCA+.最后进行了大规模的性能测试实验以及真实数据的案例研究,其结果验证了算法的效率和效用. Social relationship mining is a hot topic in the area of massive graph analysis. Graph clustering algorithms such as SCAN (structural clustering algorithm for networks) can quickly discover the communities from the massive graph data. However, relationships in these communities fail to reflect the 'real' social information such as family, colleagues and classmates. In reality, social data is very complex, and there are many types of interaction among each individual, such as calling, meeting, chatting in WeChat, and sending emails However, traditional SCAN algorithm can only handle single dimensional graph data. Based on the study of multidimensional social graph data and traditional clustering algorithms, this paper first proposes an efficient subspace clustering algorithm named SCA by mining multi-dimensional clusters in subspaces as a mean to explore real social relationships. SCA follows the bottom-up principle and can discover the set of clusters from the social graph data in all dimensions. To improve the efficiency of SCA, the paper also develops a pruning algorithm called SCA+ based on the monotonicity of subspace clustering. Extensive experiments on several real-world multi-dimensional graph data demonstrate the efficiency and effectiveness of the proposed algorithms.
出处 《软件学报》 EI CSCD 北大核心 2018年第3期839-852,共14页 Journal of Software
基金 国家自然科学基金(61402292 61772091) 国家自然科学基金广东省联合基金(U1301252) 教育部人文社会科学研究规划基金(15YJAZH058)~~
关键词 图聚类 多维图数据 社交关系 子空间 graph clustering multi-dimensional graph data social relationship subspace
  • 相关文献

参考文献1

二级参考文献28

  • 1ERTOZ L, STEINBACH M, KUMAR V. Finding clusters of different sizes, shapes and densities in noisy high-dimensional data[ R]. Minnesota: Department of Computer Science, University of Minnesota, 2002.
  • 2HAM J H, LEE D D, SAUL L K. Learning high-dimensional correspondences from low dimensional manifolds [ C ]//Proc of ICML Workshop on the Continuum from Labeled to Unlabeled Data in Machine Learning and Data Mining. Washington: [ s. n. ] , 2003:34-41.
  • 3KOHONEN T. Self-organization and associated memory [ M]. [ S. l. ]: Springer-Verlag, 1988.
  • 4KOHONEN T. Self-organizing maps [ M ]. New York: Spinger-Verlag, 2001.
  • 5MINKA T P. Automatic choice of dimensionality for PCA[ C ]//Proc of International Conference on Advances in Neural Information Processing Systems. Cambridge: [ s. n. ] , 2001:598-604.
  • 6GRIFFITHS T L, KALISH M L. A muhidimensional scaling approach to mental multiplication[ J ]. Memory & Cognition, 2002,30 ( 1 ) : 97-106.
  • 7CAMASTRA F, VINCIARELLI A. Estimating the intrinsic dimension of data with a fractal-based method [J].IEEE Trans on Pattern Anal Mach Intell, 2002,24(10) :1404-1407.
  • 8CAMASTRA F. Data dimension estimation methods: a survey[ J]. Pattern Recognition, 2003, 36:2945-2954.
  • 9SCHOLKOPF B, SMOLA A, MULLER K. Nonlinear component analysis as a kernel eigenvalue problem [ J ]. Neural Computation, 1998,10(5) :1299-1319.
  • 10TENENBAUM J B, De SILVA V, LANGFORD J C. A global geometric framework for nonlinear dimensionality reduction [ J ]. Science, 2000,290(5500) :2319-2323.

共引文献41

同被引文献47

引证文献7

二级引证文献21

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部