摘要
通过分析基于网格与基于密度的聚类算法特征,提出了一种基于网格和密度的混合聚类算法,通过分阶段聚类并选取代表单元中的种子对象来扩展类,从而减少区域查询次数,实现快速聚类。该算法保持了基于密度的聚类算法可以发现任意形状的聚类和对噪声数据不敏感的优点,同时保持了基于网格的聚类算法的高效性,适合对大规模数据的挖掘。实验数据分析验证了算法的有效性,对数据挖掘应用于设备状态监测和故障诊断具有指导意义。
Grounding on the analysis of features of grid-based and density-based clustering methods, a hybrid clustering algorithm based on grid and density was presented. By clustering in two phases and using only a small number of seed objects in representative units to expand the cluster, the frequency of region query can be decreased, and consequently the cost of time is reduced. An equivalent rule was proposed to make smooth conversion between clustering parameters in that two phases. The algorithm keeps good feature of both density-based and grid-based clustering methods. It can discover clusters with arbitrary shape with high efficiency and is insensitive to noise. So it is applicable for data mining on large database. The application of the hybrid algorithm in data analysis of accelerometer demonstrates its effectiveness. It is of instructional meaning for the application of data mining in equipment monitoring and faults diagnosis.
出处
《四川大学学报(工程科学版)》
EI
CAS
CSCD
北大核心
2006年第5期156-161,共6页
Journal of Sichuan University (Engineering Science Edition)
基金
国家自然科学基金资助项目(50575153)
关键词
数据挖掘
聚类
种子对象
data mining
clustering
seed object