摘要
针对传统的代表点聚类算法对收缩因子的敏感性和聚类数不适应数据的动态变化等问题,综合研究凝聚型层次聚类问题,提出一种代表点的近似折半层次聚类算法——ABHCURE(Approximate Binary Hierarchical Clustering Using Representatives),有效地解决了离群数据点对聚类结果的影响和聚类数的难确定问题.首先,提出单层多簇合并模式来提高算法的执行效率.其次,为了避免选择离群数据成为簇的代表点破坏原始数据分布,引入准噪声机制收集各层的准噪声数据增强算法的鲁棒性.最后,通过动态最小聚类数确定方式实现聚类数需求和确定难度的折衷.实验结果表明,该算法不仅运行时间相对较短,具有灵活的聚类数,还可以得到更高精确的聚类结果.
Considering the sensitivity of the shrinkage factor to clustering using representatives, as well as the uncertainty number of clusters with different dataset, the research on hierarchical agglomerative clustering and proposal of an Approximate Binary Hierarchi- cal Clustering Using Representatives ( ABHCURE ) algorithm. It has effectively resolved the impact of outlier-data-points in clustering results and the difficulty in determining the number of clusters. First, using single-layer multi-clusters merge mode to improve the exe- cution efficiency of hierarchical clustering algorithm. Second ,in order to prevent outlier-data-points from becoming cluster' s represent- atives, and destroy the original data distribution, pseudo-noise mechanism is introduced to enhance the robustness of hierarchical cluste- ring algorithm. Third, dynamic minimum number of clusters realize that evaluates the balance between number of clusters need and dif- ficulty of determine. The experiment results show that the ABHCURE algorithm has efficient execution, flexible number of clusters, and higher clustering precision.
出处
《小型微型计算机系统》
CSCD
北大核心
2015年第2期215-219,共5页
Journal of Chinese Computer Systems
基金
国家自然科学基金重点课题项目(61139002)资助
关键词
层次聚类
近似折半
单层多簇
准噪声机制
聚类数
hierarchical clustering
approximate binary
single-layer multi-clusters
pseudo-noise mechanism
number of clusters