摘要
聚类分析的目标是识别相似对象的组,有助于在大数据集合上发现模式的分布和有价值的相关性。由于在工程、商业和社会科学等许多应用领域都已经应用聚类分析,因此已经被广泛研究过。特别是近年来,已有的大量事务和实验数据集需要利用,数据挖掘的需求不断增长,这些都促进了聚类算法在不同领域的应用。本文介绍了聚类的基本概念,及聚类过程的重要问题-聚类结果的质量评估,同时介绍了采用外部准则对聚类结果进行评估的方法。由于对聚类结果的评估会导致计算复杂度过大的问题,因此用蒙特卡罗方法来降低计算复杂度。
Cluster analysis aims at identifying groups ofsimilar objects and, therefore helps to discover distribution ofpatterns and interesting correlations in large data sets. It has been subject of wide research since it arises in many application domains in engineering , business and social sciences. Especially, in the last years the availability ofhuge transactional and experimental data sets and the arising requirements for data mining created needs for clustering algorithms that scale and can be applied in diverse domains. This paper introduces the fundamental concepts of clustering and an important issue of clustering process regarding the quality assessment of the clustering results, Moreover, It is also introduced the cluster validity methods that use external criteria, Furthermore, the Monde Carlo Techniques are used as a solution to high computational problems that the clustering validity leads to.
出处
《长春理工大学学报(自然科学版)》
2008年第1期77-80,共4页
Journal of Changchun University of Science and Technology(Natural Science Edition)