摘要
随着分类数据规模的快速增长,关于分类数据聚类方法的研究日趋重要。在现有的算法中,CLOPE在运行速度、内存开销和聚类结果方面要优于同类算法,但是它的聚类质量并没有达到最优,而且受到输入数据顺序的影响,显现出不稳定性。基于此原因,提出一种处理分类数据的层次聚类算法HCLOPE,采用自底向上的凝聚法生成稳定的聚类结果。此外,还定义了聚簇间全局最大的收益差值作为聚类的合并准则,并引入无向图的结构优化聚类合并迭代过程。在蘑菇数据集上运行的实验结果显示HCLOPE的聚类质量更优。
With the rapid growth of categorical data volume,the research on clustering methods for categorical data becomes increasingly important. Among current categorical clustering algorithms,CLOPE has better performance than similar algorithms on processing rate,memory consumption and clustering result. However,its clustering quality has not reached the optimal yet,and is affected by the sequence of input data that leads to instability. For this reason,we propose a hierarchical clustering algorithm for categorical data processing HCLOPE,it generates stable clustering result with a bottom-to-up merging process. Moreover,we also define the global maximum delta value of profit between clusters as the merging criteria of clustering,and introduce an undirected graph structure to optimise the merging iteration process of clustering. Results of experiment conducted on mushroom benchmark dataset demonstrate that the clustering quality of HCLOPE is much higher.
出处
《计算机应用与软件》
CSCD
2016年第7期60-63,共4页
Computer Applications and Software
关键词
HCLOPE
分类数据
层次聚类
稳定性
无向图
HCLOPE
Categorical data
Hierarchical clustering
Stability
Undirected graph