期刊文献+

适合大规模数据集的增量式模糊聚类算法 被引量:17

Incremental fuzzy (c+p)-means clustering for large data
下载PDF
导出
摘要 FCPM算法已被成功地应用到模糊系统建模上,但其在某一类的聚类中心已知的大规模数据上的聚类性能较差。为了避免这个缺点,参照单程模糊c均值(SPFCM)聚类算法、在线模糊c均值(OFCM)聚类算法,提出了适合大规模数据集的增量式模糊聚类算法(Incremental fuzzy(c+p)-means clustering,IFCM(c+p))。通过在每个数据块中使用FCPM算法进行聚类,把每个数据块的聚类中心及其附近的一些样本点加入到下一个数据块参与聚类,同时添加平衡因子以提高算法聚类性能。同SPFCM、OFCM以及rse FCM算法相比,IFCM(c+p)对初始聚类中心不敏感。实验表明在没有花费很多运行时间的情况下,IFCM(c+p)算法的聚类性能比SPFCM算法和rse FCM算法更具优势,因此该算法更适合处理某一类聚类中心已知的大规模数据集。 FCPM has been demonstrated to be successful in fuzzy system modeling, however, it will be ineffective for large data clustering tasks where the cluster centers of one class are known. In order to circumvent this draw- back, referring to single-pass fuzzy c-means (SPFCM) clustering algorithm and online fuzzy c-means (OFCM) clustering algorithm, the incremental fuzzy clustering algorithm for large data called IFCM (c+p) is proposed in this paper. FCPM algorithm is used to cluster for each data block at first, and then the clustering centers of data block and some of the sample points being near them are joined into the next block to be clustered, meanwhile the bal- ance factor is given to enhance the clustering performance. In contrast to SPFCM, OFCM and rseFCM, IFCM(c+ p) is not sensitive to the initial cluster centers. The experiments indicate the proposed clustering algorithm IFCM (c +p) is competitive to the clustering algorithms SPFCM and rseFCM in the clustering performance without the loss of running time a lot, hence it is especially suitable for large data clustering tasks where the cluster centers of one class are known.
作者 李滔 王士同
出处 《智能系统学报》 CSCD 北大核心 2016年第2期188-199,共12页 CAAI Transactions on Intelligent Systems
基金 国家自然科学基金项目(61272210)
关键词 增量式模糊聚类 FCPM IFCM(c+p) 平衡因子 大规模数据集 incremental fuzzy clustering FCPM IFCM(c+p) balance factor large data
  • 相关文献

参考文献22

  • 1BEZDEK J C, EHRLICH R, FULL W. FCM: the fuzzy c-means clustering algorithm[J]. Computers & Geosciences, 1984, 10(2): 191-203.
  • 2CAN F, DROCHAK N D II. Incremental clustering for dynamic document databases[C]//Proceedings of the 1990 Symposium on Applied Computing. Fayetteville, AR, USA, 1990: 61-67.
  • 3KAUFMAN L, ROUSSEEUW P J. Finding groups in data: an introduction to cluster analysis[M]. New York: John Wiley & Sons, 2009: 830-832.
  • 4GUHA S, RASTOGI R, SHIM K. Cure: an efficient clustering algorithm for large databases[J]. Information systems, 2001, 26(1): 35-58.
  • 5CAN F. Incremental clustering for dynamic information processing[J]. ACM transactions on information systems, 1993, 11(2): 143-164.
  • 6CAN F, FOX E A, SNAVELY C D, et al. Incremental clustering for very large document databases: Initial MARIAN experience[J]. Information sciences, 1995, 84(1/2): 101-114.
  • 7ZHANG Tian, RAMAKIRSHNAN R, LIVNY M. BIRCH: An efficient data clustering method for very large databases[C]//Proceedings of the 1998 ACM SIGMOD International Conference on Management of Data. New York, USA, 1996: 103-114.
  • 8NG R T, HAN Jiawei. CLARANS: A method for clustering objects for spatial data mining[J]. IEEE transactions on knowledge and data engineering, 2002, 14(5): 1003-1016.
  • 9SHANKER B U, PAL N R. FFCM: An effective approach for large data sets[C]//Proceedings of the 3rd International Conference on Fuzzy Logic, Neural Nets and Soft Computing. Iizuka, Japan, 1994: 331-332.
  • 10CHENG Taiwai, GOLDGOF D B, HALL L O. Fast clustering with application to fuzzy rule generation[C]//Proceedings of 1995 IEEE International Fuzzy Systems, 1995. International Joint Conference of the Fourth IEEE International Conference on Fuzzy Systems and The Second International Fuzzy Engineering Symposium. Yokohama, Japan, 1995: 2289-2295.

二级参考文献20

  • 1李存华,孙志挥,陈耿,胡云.核密度估计及其在聚类算法构造中的应用[J].计算机研究与发展,2004,41(10):1712-1719. 被引量:62
  • 2张廷宪,郑志刚.耦合非线性振子系统的同步研究[J].物理学报,2004,53(10):3287-3292. 被引量:15
  • 3Jain A K, Murty M N, Flynn P J. Data clustering: A review [J]. ACM Computing Surveys, 1999, 31(3): 264-323.
  • 4B6hm C, Plant C, Shao J, et al. Clustering by synchronization [C]//Proc of the 16th ACM SIGKDD Int Conf on Knowledge Discovery and Data Mining. New York: ACM, 2010: 583-592.
  • 5Kim J, Scott C D. Lz kernel classification [J]. IEEE Trans on Pattern Analysis and Machine Intelligence, 2010, 32 (10) : 1822-1831.
  • 6Freedman D, Kisilev P. Fast data reduction via KDE approximation [C] //Proc of 2009 Data Compression Conference. Los Alamitos, CA: IEEE Computer Society, 2009, 445-445.
  • 7Chao H, Girolami M. Novelty detection employing an L2 optimal non-parametric density estimator [J]. Pattern Recognition Letters, 2004, 25(12), 1389-1397.
  • 8Moreno Y, Pacheco A F. Synchronization of Kuramo to oscillators in scale-free networks[J].Euro Physics Letters, 2004, 68(4): 603-609.
  • 9Girolami M, Chao H. Probability density estimation from optimally condensed data samples [J]. IEEE Trans on Pattern Analysis and Machine Intelligence, 2003, 25 (10) 1253-1264.
  • 10Tsang I, Kwok J, Cheung P. Core vector machines: Fast SVM training on very large datasets [J]. Journal of Machine Learning Research, 2005, 6(4): 363-392.

共引文献10

同被引文献119

引证文献17

二级引证文献37

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部