期刊文献+

HCLOPE:一种处理分类数据的优化层次聚类算法 被引量:2

HCLOPE: AN OPTIMISED HIERARCHICAL CLUSTERING ALGORITHM FOR CATEGORICAL DATA PROCESSING
下载PDF
导出
摘要 随着分类数据规模的快速增长,关于分类数据聚类方法的研究日趋重要。在现有的算法中,CLOPE在运行速度、内存开销和聚类结果方面要优于同类算法,但是它的聚类质量并没有达到最优,而且受到输入数据顺序的影响,显现出不稳定性。基于此原因,提出一种处理分类数据的层次聚类算法HCLOPE,采用自底向上的凝聚法生成稳定的聚类结果。此外,还定义了聚簇间全局最大的收益差值作为聚类的合并准则,并引入无向图的结构优化聚类合并迭代过程。在蘑菇数据集上运行的实验结果显示HCLOPE的聚类质量更优。 With the rapid growth of categorical data volume,the research on clustering methods for categorical data becomes increasingly important. Among current categorical clustering algorithms,CLOPE has better performance than similar algorithms on processing rate,memory consumption and clustering result. However,its clustering quality has not reached the optimal yet,and is affected by the sequence of input data that leads to instability. For this reason,we propose a hierarchical clustering algorithm for categorical data processing HCLOPE,it generates stable clustering result with a bottom-to-up merging process. Moreover,we also define the global maximum delta value of profit between clusters as the merging criteria of clustering,and introduce an undirected graph structure to optimise the merging iteration process of clustering. Results of experiment conducted on mushroom benchmark dataset demonstrate that the clustering quality of HCLOPE is much higher.
出处 《计算机应用与软件》 CSCD 2016年第7期60-63,共4页 Computer Applications and Software
关键词 HCLOPE 分类数据 层次聚类 稳定性 无向图 HCLOPE Categorical data Hierarchical clustering Stability Undirected graph
  • 相关文献

参考文献2

二级参考文献25

  • 1Klosgen W, Zytkow J M. Knowledge discovery in databases terminology [A]. Advances in Knowledge Discovery and Data Mining[C]. AAAI Press/The MIT Press, 1996. 573-592.
  • 2Cormack R M. A review of classification [J]. J Roy Statist Soc Serie A, 1971,134: 321-367.
  • 3Anderberg M R. Cluster Analysis for Applications[M].New York : Academic Press, 1973.
  • 4Zhexue Huang, Michael K Ng. A fuzzy k-modes algorithm for clustering categorical data [J]. IEEE Trans on Fuzzy Systems, 1999,7 (4): 446-452.
  • 5Zhexue Huang. A fast clustering algorithm to cluster very large categorical data sets in data mining[A]. Proc of the SIGMOD Workshop on Research Issues on Data Mining and Knowledge Discovery [C]. ACM Press,1997. 1-8.
  • 6Yiling Yang, Xudong Guan. CLOPE: A fast and effective clustering algorithm for transactional data [ A ]. The Eighth ACM SIGKDD Int Conf on Knowledge Discovery and Data Mining[C]. Edmonton,2002.
  • 7Michalski R S, Stepp R E. Automated construction of classifications: Conceptual clustering versus numerical taxonomy[J]. IEEE PAMI, 1983,5: 396-410.
  • 8Sudipto Guha, Rajeev Rastogi, Kyuseok Shim. ROCK: A robust clustering algorithm for categorical attributes. In Proc. 1999 Int. Conf. Data Engineering, Sydney, Australia, Mar., 1999, pp.512-521.
  • 9Alexandros Nanopoulos, Yannis Theodoridis, Yannis Manolopoulos. C2P: Clustering based on closest pairs. In Proc. 27th Int. Conf. Very Large Database, Rome, Italy, September, 2001, pp.331-340.
  • 10Ester M, Kriegel H P, Sander J, Xu X. A density-based algorithm for discovering clusters in large spatial databases.In Proc. 1996 Int. Conf. Knowledge Discovery and Data Mining (KDD'96), Portland, Oregon, USA, Aug., 1996,pp.226-231.

共引文献34

同被引文献12

引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部