摘要
传统的聚类算法是针对一个独立数据集的学习分类算法,如FCM(Fuzzy-C-Means)聚类算法.在现实生活中,一个数据集独立于其它数据集,而往往通过与别的数据集交换信息与之相互合作.因此在聚类过程中,需要考虑来自其它数据集的影响,从而得到更能反映现实的数据结构.该文提出了一种基于信息理论的信息增益方法来建模并定量分析多个数据集间的合作关系.在此基础上,导出了相应的新合作聚类算法CCA(CooperativeCluste-ringAlgorithm).理论分析表明该算法最终收敛.实验结果也进一步表明了该合作聚类算法的可行性与有效性.
Conventional clustering algorithms are designed for a single independent dataset, e.g.Fuzzy C-Means (FCM) clustering algorithm. In real world, a dataset is independent of other datasets but sometimes can be cooperative with others by exchanging information, such as the relationship between the subsidiary companies. So the influence from other relative collaborative datasets should be considered while performing clustering learning under such collaborative circumstances. Two different collaborative models are discussed and new proper methods are proposed to quantitatively measure such collaboration between datasets in this paper, e.g. information gain. The corresponding collaborative clustering algorithms are presented accordingly and the theoretic analysis shows that the new cooperative clustering algorithms can finally converge to local minimum. Experimental results demonstrate that the clustering structures obtained by new cooperative algorithms are different from those of conventional algorithms for the consideration of collaboration and the performances of these collaborative clustering algorithms can be much better than those conventional “single” clustering algorithms under the cooperating circumstances.
出处
《计算机学报》
EI
CSCD
北大核心
2005年第8期1287-1294,共8页
Chinese Journal of Computers
基金
中法先进计划项目基金(PRASI03-02)资助
关键词
信息论
聚类
模糊
模式识别
information theory
clustering
fuzzy
pattern recognition