摘要
传统网格聚类算法聚类质量低,而密度聚类算法时间复杂度高。针对两类算法各自的缺点,结合它们的聚类思想提出了一种新的聚类算法。该算法提出了边缘度密度距作为新的密度度量,并在此基础上逐步确定了类的定义和聚类过程的定义。算法前期通过网格划分操作统计记录了待聚类数据的初始信息,以供随后的k近邻统计使用。在寻找聚类中心点时使用了桶排序的策略,使得算法能快速地选出下一个聚类中心点。随后的聚类步骤是迭代搜索并检验当前类中未检验的k近邻是否满足密度可达性来完成聚类。理论分析和实验测试的结果表明,该算法不仅保持了较高的聚类精度,而且有接近线性的低时间复杂度。
Clustering algorithms based on grid have a drawback of low clustering precision, and most clustering algo- rithms based on density have high time complexity. In order to improve clustering performance, a cluster algorithm based on edge density distance was proposed in this paper. The new cluster algorithm makes new definitions of density and category. In the clustering process, data are divided into grids and some initial information is recorded firstly for the operation of finding k near points. Then in the process of finding a new clustering center, a method come from bucket sort is used,which makes it fast to find the clustering center. A subsequent procedure is to iteratively analyse k near points of one category to judge whether they are density accessible. Analysis in theory and result of experiments show that the proposed algorithm has both high quality in clustering result and low time complexity.
出处
《计算机科学》
CSCD
北大核心
2014年第8期245-249,共5页
Computer Science
基金
浙江省重点科技创新团队项目(2010R50009)资助