摘要
数据挖掘用于从超大规模数据库中提取感兴趣的信息。聚类是数据挖掘的重要工具,根据数据间的相似性 将数据库分成多个类,每类中数据应尽可能相似。从机器学习的观点来看,类相当于隐藏模式,寻找类是无监督学 习过程。目前已有应用于统计、模式识别、机器学习等不同领域的几十种聚类算法。该文对数据挖掘中的聚类算法 进行了归纳和分类,总结了7类算法并分析了其性能特点。
Data mining is used to draw interesting information from Very Large DataBases (VLDB). Clustering plays an outstanding role in data mining applications. Clustering is a division of databases into groups of similar objects based on the similarity. From a machine learning perspective clusters correspond to hidden patterns, the search for clusters is unsupervised learning. There are tens of clustering algorithms used in various fields such as statistics, pattern recognition and machine learning now. This paper concludes the clustering algorithms used in data mining and assorts them into 7 classes. Seven types of algorithms are summarized and their performances are analyzed here.
出处
《电子与信息学报》
EI
CSCD
北大核心
2005年第4期655-662,共8页
Journal of Electronics & Information Technology
基金
国家自然科学基金(60002003)资助课题
关键词
数据挖掘
聚类
分层聚类
分割聚类
K-MEANS
Data mining, Clustering, Hierarchical clustering, Partitioning clustering, K-Means