摘要
实际的网络化数据往往包含不同类型的节点和边,采用异质信息网络建模可以更加全面的包含交互对象和对象之间的关联,因此异质信息网络分析成为数据挖掘的研究热点.虽然同质信息网络中的聚类已经被深入研究,但是异质信息网络中的聚类还很少研究.异质信息网络中多类对象共存以及丰富的语义信息对聚类分析提出了新的挑战.本文研究异质信息网络中的聚类问题,并提出了一种基于矩阵分解的聚类方法 HeteClus.该方法首先利用HeteSim计算基于用户指定的语义路径的对象相似度矩阵;然后采用正交非负矩阵三因子化分解方法得到节点的软聚类或者硬聚类结果.人工和实际网络数据验证了方法的有效性,并通过实例阐明了矩阵分解的物理意义.
The real networked data often contain different types of objects and links, and therefore it can be modeled with heterogene- ous information network, which includes more comprehensive interaction between the objects and contains rich semantics. Although the clustering on homogeneous networks, has been intensively studied, few works are done on heterogeneous networks. The multiple types of objects and rich semantics in heterogeneous information networks bring new challenges to the clustering analysis on it. In this paper, we study the clustering problem in heterogeneous networks, and propose a matrix factorization based clustering method:HeteClus. The method first uses HeteSim to build the similarity matrix based on user-guided path. Then the orthogonal non-negative matrix tri-factori- zation method is employed to get clustering results. Experiments on artificial and real networked data are performed to demonstrate the capability of this method, and case studies are given to show the physical meaning of the matrix decomposition results.
出处
《小型微型计算机系统》
CSCD
北大核心
2014年第10期2256-2261,共6页
Journal of Chinese Computer Systems
基金
国家自然科学基金项目(61375058
61074128
71231002
60905025)资助
国家重点基础研究发展规划项目(2013CB329603)资助
中央高校基本科研业务费专项资金资助
关键词
异质信息网络
聚类
非负矩阵分解
矩阵三因子化
语义相似度计算
heterogeneous information network
clustering analysis
nonnegative matrix factorization
matrix tri-factorization
semantic similarity search