摘要
经典的模糊C-均值聚类算法存在对噪声数据较为敏感、未考虑样本属性特征间的不平衡性及对高维数据聚类不理想等问题,而可能性聚类算法虽然解决了噪声敏感和一致性聚类问题,但算法假定每个样本对聚类的贡献程度一样。针对以上问题,提出了一种基于样本-特征加权的可能性模糊核聚类算法,将可能性聚类应用到模糊聚类中以提高其对噪声或例外点的抗干扰能力;同时,根据不同类的具体特性动态计算样本各个属性特征对不同类别的重要性权值及各个样本对聚类的重要性权值,并优化选取核参数,不断修正核函数把原始空间中非线性可分的数据集映射到高维空间中的可分数据集。实验结果表明,基于样本-特征加权模糊聚类算法能够减少噪声数据和例外点的影响,比传统的聚类算法具有更好的聚类准确率。
Classic fuzzy C-means clustering is a noise-data-sensitive algorithm, which does not take the imbalances among characteristics of samples into consideration and is not suitable for clustering high dimensional data. The possibilistic clustering solves the noise-sensitive and consistency of clustering problems but it is under the assumption that each sample has the same contribution to the clustering. Therefore, a sample-feature weighted possibilistic fuzzy kernel clustering algorithm is proposed. The possibilistic clustering is applied to fuzzy clustering in order to improve the anti-interference ability of noise or exceptional points, meanwhile, according to the specific characteristics of different types, the importance of each sample characteristic upon different types is measured dynamically, as well as the im- portance of each sample upon different cluster, and the optimal nuclear parameters is selected. To map the non-linear-separable data cluster in the original space to the homogeneous data cluster in the high-di- mensional space, the kernel functions are modified constantly. The experimental results show that the sample-feature weighted possibilistic fuzzy kernel clustering algorithm can reduce the impact of noisy da- ta and exceptional points and it has better clustering rate than classic clustering algorithm.
出处
《计算机工程与科学》
CSCD
北大核心
2014年第1期169-175,共7页
Computer Engineering & Science
基金
江西省自然科学基金资助项目(20114BAB201028)
华东交通大学校立科研基金资助项目(11QT04)
关键词
样本加权
特征加权
模糊C均值
可能性模糊聚类
核函数
sample weighted feature weighted fuzzy C-means possibilistic fuzzy clustering
kernel