摘要
PAM(Partitioning Around Medoids)是一种基于k-中心点的聚类算法,在处理数据集聚类时,具有较强的鲁棒性和准确性。但是,PAM算法的主要缺点是确定聚类中心点集所需的计算代价太高。对于大数据集,PAM聚类过程缓慢。提出一种利用部分距离搜索(PDS),先前中心点标号(PMI),以及三角不等式消除(TIE)准则等搜索策略来降低中心点迭代所需计算复杂性,实现快速PAM聚类的新算法。实验结果表明,相对于基本PAM聚类算法,在保持相同聚类效果的情况下,快速PAM聚类新算法能够减少70%~90%的乘法计算量,并可节省约1/3以上的计算时间。
PAM ( Partitioning Around Medoids) algorithm is one of the popular k-mediod clustering algorithms,which has strong robustness and correctness when processing large datasets. However, PAM clustering algorithm suffers from heavy computational burden in large data set processing. A novel efficient PAM algorithm is proposed, which utilizes Partial Distance Search (PDS), Previous Medoid Index (PMI), and Triangular Inequality Elimination (TIE) Criteria to facilitate distance computation when searching for optimal medoids. Experimental results demonstrate the effectiveness of this algorithm,which may reduce multiplications by from 70% to 90% and save at least 1/3 running time, while retaining exactly the same clustering quality comparing with the basic PAM clustering algorithm.
出处
《计算机应用与软件》
CSCD
北大核心
2008年第9期8-11,共4页
Computer Applications and Software
基金
国家自然科学基金项目(60673082)