摘要
传统的聚类分析方法一般都没有考虑大容量数据集合的问题,而数据挖掘技术的研究重点之一就是如何从海量数据中高效率地获取知识。结合基于分类方法的K-means中心点算法以及基于层次方法的BIRCH增量算法提出核心树(Core-Tree)的思想来弥补两个算法的缺点,使用中心点的思想来表示BIRCH算法中汇总信息,利用类核心的思想来提高确定中心点的效率。因此,提出一种聚类算法,主要集中在如何提高大型数据集合的聚类效率、如何处理具有各种特征的数据集合。
Clustering analysis in data mining deploys many traditional methods. All these methods have not been considered large volume data sets. However, to efficiently obtain knowledge from large amount of data sets is the top - leading problem in data mining area. Basing on the K - means center points algorithm and the BIRCH increment algorithm, the author poses the concept of core - tree which could make up the weakness of these two algorithms, That is, using center point to indicate the summary information in BIRCH, and using class core to improve the efficiency of center point orientation. Therefore, cluste- ring analysis in aims at improving efficiency of algorithm and ability of processing variant types of data.
出处
《湖北师范学院学报(自然科学版)》
2011年第2期18-23,共6页
Journal of Hubei Normal University(Natural Science)
关键词
增量聚类
核心树
中心点
聚类特征
increment clustering
core - tree
center point
clustering feature