摘要
针对属性取值为区间变量的高维数据聚类问题,提出基于模糊离散化的CABOSFV改进算法——FD-CABOSFV。针对属性组合利用模糊C均值聚类的思想进行属性取值的离散化,并通过A水平截取的方式确定各对象对离散化属性的归属,将其转换为二态变量后利用CABOSFV算法进行聚类。采用三组UCI基准数据集将FD-CABOSFV与著名的K-means聚类算法进行比较,实验结果表明FD-CABOSFV更有效。
Abstract FD-CABOSFV, an improved algorithm of CABOSFV based on fuzzy diseretizaton, is proposed for high-dimensional data clustering of interval-scaled variables. It discretizes the data of each attribute portfolio by using the idea of fuzzy C means clustering,and determines each object's discretized attribute category by λ cut turning the attributevalue into binary variables,and then uses CABOSFV algorithm to complete clustering. Three UCI benchmark data sets were used to compare FD-CABOSFV with famous K-means clustering algorithm. The empirical tests show that FD-CABOSFV is more effective.
出处
《信息系统学报》
2012年第1期77-87,共11页
China Journal of Information Systems
基金
国家自然科学基金(70771007)
中央高校基本科研业务费专项资金(FRF-TP-10-006B).
关键词
CABOSFV算法
属性组合
模糊离散化
区间变量
CABOSFV algorithm, Attribute portfolio, Fuzzy discretizatlon, Interval-scaled variable