摘要
由于人们对事物认知的局限性和信息的不确定性,在对决策问题进行聚类分析时,传统的模糊聚类不能有效解决实际场景中的决策问题,因此有学者提出了有关犹豫模糊集的聚类算法。现有的层次犹豫模糊K均值聚类算法没有利用数据集本身的信息来确定距离函数的权值,且簇中心的计算复杂度和空间复杂度都是指数级的,不适用于大数据环境。针对上述问题,文中提出了一种基于密度峰值思想的加权犹豫模糊聚类算法(WHFDP),首先给出了犹豫模糊元素集的补齐方法,并结合变异系数理论给出了新的距离函数权重计算公式,然后利用密度峰值选取簇中心,不仅降低了簇中心计算的复杂度,而且提高了对不同规模以及任意形状数据集的适应性,算法的时间复杂度和空间复杂度也降为多项式级,最后采用典型数据集进行仿真实验,证明了所提算法的有效性。
Due to cognitive limitations and the information uncertainty,traditional fuzzy clustering cannot effectively solve the decision-making problems in a real-life scenario when cluster analysis is carried out on the decision problem.Therefore,hesitant fuzzy sets(HFSs)clustering algorithms were proposed.The conception of hesitant fuzzy sets is evolved from fuzzy sets which are applied to fuzzy linguistic approach.The distance function of the hierarchical hesitant fuzzy K-means clustering algorithm has the same weight since the datasets information is seldom considered,and the computational complexity for computing the cluster center is exponential which is unavailable in the big data environment.In order to solve the above problems,this paper presents a novel clustering algorithm for hesitant fuzzy sets based on density peaks,called WHFDP.Firstly,a new method for extending the short hesitant fuzzy elements set to calculate the distance between two HFSs is proposed and a new formula for calculating the weight of distance function combined with the coefficient of variation is given.In addition,the computational complexity for computing the cluster center is reduced by using density peaks clustering method to select cluster center.Meanwhile,the adaptability to data sets with different sizes and arbitrary shapes is also improved.The time complexity and space complexity of the algorithm are reduced to polynomial level.Finally,typical data sets are used for simulation experiments,which prove the effectiveness of the new algorithm.
作者
张煜
陆亿红
黄德才
ZHANG Yu;LU Yi-hong;HUANG De-cai(College of Computer Science and Technology,Zhejiang University of Technology,Hangzhou 310023,China)
出处
《计算机科学》
CSCD
北大核心
2021年第1期145-151,共7页
Computer Science
基金
浙江省公益技术应用项目(LGG19E090001)。
关键词
数据挖掘
聚类算法
犹豫模糊集
密度峰值
变异系数
Data mining
Clustering algorithm
Hesitant fuzzy sets
Density peaks
Coefficient of variation