摘要
针对传统次胜者受罚竞争学习(RPCL)算法忽略数据集几何结构对节点权值调整的影响,以及魏立梅等提出的新RPCL算法(魏立梅,谢维信.聚类分析中竞争学习的一种新算法.电子科学学刊,2000,22(1):13-18)引入密度来对节点的权值进行调整时,密度定义的主观性,提出基于样本空间分布密度的改进RPCL算法。该算法根据数据集样本自然分布定义样本密度,将此密度引入RPCL节点权值调整;使用UCI机器学习数据库数据集以及随机生成的带有噪声点的人工模拟数据集对算法进行实验测试,对算法确定数据集类簇数目的准确率、运行时间、聚类误差平方和、聚类结果的Rand指数、Jaccard系数以及Adjust Rand index参数进行分析比较。各项实验结果显示:所提算法优于原始RPCL算法和魏立梅算法,具有更好的聚类效果,对噪声数据有很强的抗干扰性能。所提算法不仅能根据样本的自然分布确定数据集的合理类簇数目,而且能确定合适的类簇中心,提高聚类的准确性,使聚类结果尽可能快地收敛到全局最优解。
The original Rival Penalized Competitive Learning(RPCL) algorithm ignores the influence of the geometry structure of a dataset on the weight variation of its nodes.A new RPCL algorithm proposed by Wei Limei et al.(WEI LIMEI,XIE WEIXIN.A new competitive learning algorithm for clustering analysis.Journal of Electronics,2000,22(1): 13-18) overcame the drawback of the original RPCL by introducing the density of samples to adjust the weights of nodes,while the density was not much objective.This paper defined a new density for a sample according to the pattern distribution of samples in a dataset,and introduced the density into the adjusting for the weights of nodes in RPCL to overcome the disadvantages of the available RPCL algorithms.The authors' improved RPCL algorithm was tested on some well-known datasets from UCI machine learning repository and on some synthetic data sets with noisy samples.The accuracy of determining the number of clusters of a dataset and the run time and the clustering error of the algorithms were compared.The Rand index,the Jaccard coefficient and the Adjust Rand index were used to analyze the performance of the algorithms.The experimental results show that the improved RPCL algorithm outperforms the original RPCL and the new RPCL proposed by WEI LIMEI et al.greatly,and achieves much better clustering results and has a stronger anti-interference performance for noisy data than that of the other two RPCL algorithms.All the analyses demonstrate that the improved RPCL algorithm can not only determine the right number of clusters for a dataset according to its sample distribution,but also uncover the suitable centers of clusters and advance the clustering accuracy as well as approximate the global optimal clustering result as fast as possible.
出处
《计算机应用》
CSCD
北大核心
2012年第3期638-642,共5页
journal of Computer Applications
基金
中央高校基本科研业务费专项资金资助项目(GK200901006
GK201001003)
陕西省自然科学基础研究计划项目(2010JM3004)
关键词
聚类
次胜者受罚竞争学习算法
样本密度
聚类数目
聚类中心
clustering
Rival Penalized Competitive Learning(RPCL) algorithm
sample density
cluster number
cluster center