期刊文献+

高阶异构数据层次联合聚类算法 被引量:6

A Hierarchical Co-Clustering Algorithm for High-Order Heterogeneous Data
下载PDF
导出
摘要 在实际应用中,包含多种特征空间信息的高阶异构数据广泛出现.由于高阶联合聚类算法能够有效融合多种特征空间信息提高聚类效果,近年来逐渐成为研究热点.目前高阶联合聚类算法多数为非层次聚类算法.然而,高阶异构数据内部往往隐藏着层次聚簇结构,为了更有效地挖掘数据内部隐藏的层次聚簇模式,提出了一种高阶层次联合聚类算法(high-order hierarchical co-clustering algorithm,HHCC).该算法利用变量相关性度量指标Goodman-Kruskalτ衡量对象变量和特征变量的相关性,将相关性较强的对象划分到同一个对象聚簇中,同时将相关性较强的特征划分到同一个特征聚簇中.HHCC算法采用自顶向下的分层聚类策略,利用指标Goodman-Kruskalτ评估每层对象和特征的聚类质量,利用局部搜索方法优化指标Goodman-Kruskalτ,自动确定聚簇数目,获得每层的聚类结果,最终形成树状聚簇结构.实验结果表明HHCC算法的聚类效果优于4种经典的同构层次聚类算法和5种已有的非层次高阶联合聚类算法. The availability of high-order heterogeneous data represented with multiple features coming from heterogeneous domains is getting more and more common in real world application. High-order co-clustering algorithms can fuse multiple feature space information to improve clustering results effectivity, so recently it is becoming one of the hottest research topics. Most existing high-order coclustering algorithms are non-hierarchical clustering algorithms. However, there are always hierarchical cluster structures hidden in high-order heterogeneous data. In order to mine the hidden patterns in datasets more effectively, we develop a high-order hierarchical co-clustering algorithm (HHCC). Goodman-Kruskal r is used to measure the association of objects and features, which is an index measuring association of categorical variables. The objects which are strong association are partitioned into the same objects clusters, and simutaneously the features which are strong association are partitioned into the same features clusters too. HHCC algorithm uses Goodman-Kruskal τ to quantify the quality of clustering results of objects and features of every level. According to optimizing Goodman-Kruskal τ by a locally search approach, the number of clusters is automatically determined and clustering results of every hierarchy are obtained. The top-down strategy is adopted and a tree- like cluster structure is formed at last. Experimental results demonstrate that HHCC algorithm outperforms four classical homogeneous hierarchical algorithms and five previous high-order coclustering algorithms.
出处 《计算机研究与发展》 EI CSCD 北大核心 2015年第1期200-210,共11页 Journal of Computer Research and Development
基金 国家自然科学基金项目(71272216) 国家科技支撑计划基金项目(2012BAH08B02) 中央高校基本科研业务专项资金项目(HEUCF100603 HEUCFZ1212)
关键词 高阶异构数据 联合聚类 层次聚类 相关性度量 多种特征空间 high-order heterogeneous data co-clustering hierarchical clustering measure of association multiple feature space
  • 相关文献

参考文献22

  • 1Long B,Wu Xiaoyun,Zhang Zhongfei,et al.Unsupervised learning on k-partite graphs[C]//Proc of the 12th ACM SIGKDD Int Conf on Knowledge Discovery and Data Mining.New York:ACM,2006:317-326.
  • 2Ienco D,Robardet C,Pensa R G,et al.Parameter-less co-clustering for star-structured heterogeneous data[J].Data Mining and Knowledge Discovery,2013,26(2):217-254.
  • 3Wang Hua,Nie Feiping,Huang Heng,et al.Nonnegative matrix tri-factorization based high-order co-clustering and itsfast implementation[C]//Proc of the 11th IEEE Int Conf onData Mining.Piscataway,NJ:IEEE,2011:174-183.
  • 4Gao Bin,Liu Tieyan,Zheng Xin,et al.Consistent bipartitegraph co-partitioning for star-structured high-orderheterogeneous data co-clustering[C]//Proc of the 11th ACM SIGKDD Int Conf on Knowledge Discovery in Data Mining.New York:ACM,2005:41-50.
  • 5Gao Bin,Liu Tieyan,Ma Weiying.Star-structured high-order heterogeneous data co-clustering based on consistent information theory[C]//Proc of the 6th IEEE Int Conf on Data Mining.Piscataway,NJ:IEEE,2006:880-884.
  • 6Shao Jian,Yin Wentao,Ma Shuai, et al.Topic discovery ofweb video using star-structured k-partite graph[C]//Proc ofthe 18th Int Conf on Multimedia.New York:ACM,2010:915-918.
  • 7Gao Bin,Liu Tieyan,Qin Tao,et al.Web image clusteringby consistent utilization of visual features and surroundingtexts[C]//Proc of the 13th Annual ACM Int Conf on Multimedia.New York:ACM,2005:112-121.
  • 8Rege M,Dong Ming,Hua Jing.Graph theoreticalframework for simultaneously integrating visual and textualfeatures for efficient web image clustering[C]//Proc of the17th Int Conf on World Wide Web.New York:ACM,2008;317-326.
  • 9Gao Bin,Liu Tieyan,Feng Guang,et al.Hierarchicaltaxonomy preparation for text categorization using consistentbipartite spectral graph co-partitioning[J].IEEE Trans on Knowledge and Data Engineering,2005,17(9):1263-1273.
  • 10Antonio D C,Greco G,Guzzo A,et al.An information-theoretic framework for high-order co-clustering ofheterogeneous objects[C]//Proc of the 17th European Confon Machine Learning.Berlin:Springer,2006:598-605.

同被引文献63

  • 1郝志宇,翟健宏,云晓春,张宏莉.动态路由模拟策略研究[J].通信学报,2007,28(12):19-24. 被引量:2
  • 2王功聪.基于内容的网络行为分析[D].北京:北方工业大学,2013.
  • 3Zhang JianPei, Yang Yue, Yang Jing, et al. Spa- tial clustering algorithm based on opimizati on-Di- vision[C] // Proc of the 4th Int' 1 Conf on Fuzzy Systems and Knowledge Discovery, 2007 : 265-271.
  • 4Kaufman L, Rousseeuw P J. Finding groups in data., an introduction to cluster analysis[M]. New York .. John Wiley& Sons, 1990.
  • 5SHARIFI, ABOOSALEH M, AMIRGHOLIPOUR. Intrusion de- tection based on joint of k-means and knn[J]. Journal of Conver- gence Information Technology,2014(5) :45-52.
  • 6SHASI4IDHAR HV,SUBRAMANIAN VARADARAJAN. Customer segmentation of bank based on data mining security value based heuristic approach as a replacement to kmeans segmentation[J]. International Journal of Computer Applications, 2011 (5) : 66-72.
  • 7S VIMALA. Convergence analysis of eodehook generation teeh: niques for vector quantization using K-Means clustering technique [J]. International Journal of Computer Applications, 2011 (3) : 85- 92.
  • 8NAL1NI SINGH, AMBARISH G MOHAPATRA. Breast cancer mass detection in mammograms using kmeans and fuzzy cmeans clustering [J]. International Journal of Computer Applications, 2014 (3) : 34-40.
  • 9HEJIN YUAN,CUIRU WANG. A human action recognition algo- rithm based on semi-supervised kmeans clustering[J]. Transactions on Edutainment, 2014 (6): 47- 52.
  • 10唐然,龙腾锐,龙向宇.基于模糊聚类的改进遗传算法[J].重庆大学学报(自然科学版),2008,31(2):166-169. 被引量:6

引证文献6

二级引证文献9

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部