摘要
与单一属性数据不同,混合属性数据通常存在尺度不一致的特点,为了可以得到准确率更高的混合属性聚类结果,提出一种基于k最近邻的混合属性聚类算法。采用高频系数滑动窗口准确估计含有噪声的混合属性数据噪声方差,通过BayesShrink阈值估计算法得到最佳阈值,对混合属性数据展开去噪。采用k最近邻方法展开数据聚类,在去噪后的数据样本贡献度中加入特征权重,并计算融入贡献度后的特征权重欧几里得距离,距离越近,说明数据属于同一类别的概率就越大,对全部样本特征展开加权处理后,构建混合属性聚类模型,利用粒子群算法对模型展开寻优,获取最优加权特征向量,实现混合属性数据聚类。仿真结果表明,所提算法可以有效提升混合属性聚类结果的精度和聚类效率。
Unlike single attribute data,mixed attribute data usually has the characteristics of inconsistent scales.In order to obtain a more accurate mixed attribute clustering result,this paper put forward a mixed attribute clustering algorithm based on k-nearest neighbor.Firstly,the noise variance of mixed attribute data containing noise was accurately estimated using a high-frequency coefficient sliding window.Then,the optimal threshold was obtained through the Bayeshrink threshold estimation algorithm.Meanwhile,the mixed attribute data was denoised.Moreover,the knearest neighbor method was applied in data clustering,and the feature weight was added to the contribution of the denoised data samples.Furthermore,the Euclidean distance of the feature weight after incorporating the contribution was calculated.The closer the distance,the larger the probability that the data belonged to the same category.After all the sample features were weighted,a mixed attribute clustering model was constructed.Finally,the particle swarm optimization algorithm was used to optimize the model,thus obtaining the optimal weighted feature vector and realizing the clustering of mixed attribute data.Simulation results show that the proposed algorithm could effectively improve the accuracy and clustering efficiency of mixed attribute clustering results.
作者
董华松
连远锋
DONG Hua-song;LIAN Yuan-feng(College of Information Science and Engineering,China University of Petroleum(Beijing),Beijing 102249,China)
出处
《计算机仿真》
2024年第5期460-464,共5页
Computer Simulation
关键词
混合属性数据
阈值估计算法
粒子群算法
Mixed attribute data
Threshold estimation algorithm
Particle swarm optimization