摘要
在传统确定数据集聚类数算法原理的基础上,提出一种新的算法——MHC算法。该算法采用自底向上的策略生成不同层次的数据集划分,计算每个层次的聚类划分质量,通过聚类质量选择最佳的聚类数。还设计一种新的有效性指标——BIP指标,用于衡量不同划分的聚类质量,该指标主要依托数据集的几何结构。实验结果表明,该算法能准确地确定多维数据集中的最佳聚类数。
In order to better determine the optimal clustering number for multi-dimensional data, this paper proposes an new algorithm MHC, which is based on the principle of the traditional algorithm to determine the clustering number for the dataset. This algorithm adopts bottom-up method to generate dataset partition of different levels. In every division, the algorithm automatically generates the partition of clustering quality, and chooses the optimal clustering number by the clustering quality. Additionally, it still presents a new clustering validity index Between-In- Proportion(BIP), which is used to measure the different division of clustering quality, and mainly depends on the geometrical structure of datasets. Theoretical analysis and experimental results verify the effectiveness and good performance of the new validity index and the MHC algorithm.
出处
《计算机工程》
CAS
CSCD
2012年第9期8-11,共4页
Computer Engineering
基金
国家"863"计划基金资助重点项目(2007AA010305)
陕西省自然科学基础研究计划基金资助项目(SJ08-ZT14)
陕西省教育厅科学研究计划基金资助项目(06JK229
09JK683)
关键词
多维数据集
聚类数
聚类有效性指标
层次聚类
multi-dimensional dataset
clustering number
clustering validity indicator
hierarchy clustering