摘要
传统的数据聚类统计方法仅适用于低维数据聚类问题,为此,设计了基于模糊数据的高维稀疏数据聚类统计方法,以期提升高维稀疏数据的聚类统计效果.以模糊C均值聚类算法为基础,通过优化初始聚类中心解决局部最优问题,缩短聚类统计时间;然后引入权重机制,令该方法适用于高维稀疏数据聚类统计.基于此,以余弦距离替换原有的欧几里德距离,提高高维稀疏数据聚类统计效果.实验证明:在数据维度不同时,该方法均有较优的聚类统计效果.当数据维度较低时,分块比例为10%时聚类统计效果最优;当数据维度较高时,分块比例为40%时聚类统计效果最优.在不同稀疏度等级时,该方法的命中率和聚类统计效率均较高.
Traditional data clustering statistics method is only applicable to low dimensional data clustering problem,therefore,this study designed a high-dimensional sparse data clustering based on fuzzy data statistical method,the clustering of high-dimensional sparse data statistics results.Based on the fuzzy C-means clustering algorithm,by optimizing the initial clustering center,solve the problem of local optimum,shorten the clustering statistics time;Then weighting mechanism are introduced,the method is suitable for high-dimensional sparse data clustering statistics.Based on this,in order to replace the original Euclidean distance,cosine distance to improve the effect of high-dimensional sparse data clustering statistics.Experiments show:the data dimension is not at the same time,this method has a better clustering effect of statistics.When data dimension is low,partitioned clustering statistics result when compared with 10%of the optimal;When high dimension data,block ratio is 40%when the optimal clustering statistics effect.In the sparse degree of different grade,the shooting and cluster statistical efficiency of the method are high.
作者
周燕茹
ZHUO Yanru(School of Mathematics and Statistics,Chaohu University,Chaohu 238000,China)
出处
《吉林化工学院学报》
CAS
2021年第9期107-111,共5页
Journal of Jilin Institute of Chemical Technology
关键词
模糊数学
高维稀疏数据
聚类统计
模糊C均值
聚类中心
余弦距离
fuzzy mathematics
high-dimensional sparse data
clustering statistics
fuzzy C-means
the clustering center
cosine distance