摘要
CLARA是k-中心值聚类的一种算法,在处理大型数据集的聚类问题时,比PAM(围绕中心点的划分)更具有良好的伸缩性,但CLARA算法随机抽样中存在采样不准确的缺点。本文针对这一不足,使用了数据场的概念对CLARA聚类算法进行了有益的改进,提高了采样的准确性,使其更适合于对大型多维数据集的处理,提高了挖掘结果的质量。
CLARA is a K-Medoids algorithm. It is more efficient and flexible than PAM(Partitioning around Medoid) to handle large data sets. However, it is inaccurate in random sampling in algorithm CLARA. Based on data field conception, an improved CLARA is designed to solve the problem. This improved algorithm is more efficient and improves quality of result of data mining.
出处
《现代计算机》
2006年第6期19-21,36,共4页
Modern Computer