摘要
针对传统的云存储数据分段聚类方法存在运行效率较低、聚类结果不够平滑等问题,提出一种基于机器学习的云存储数据分段聚类方法。从云存储数据库中合理抽取多个小数据集,小数据集包含云存储数据库中的所有自然簇,根据相似度定义构建相似度矩阵。采用非线性核主成分算法实现对相似度矩阵中数据相似度的测度,通过相似度测度将具有相同特征的数据归为一类,采用混合高斯分布概率密度模型计算不同类别数据的后验概率,通过对概率大小的比较实现云存储数据分段聚类。实验结果证明,所提方法能够缩短聚类运行时间,将聚类变化度降低到29%,有效提高了聚类结果的平滑度。
Traditionally,the segmentation clustering method for cloud storage data leads to low operational efficiency and unsmooth clustering results.Therefore,a segmental clustering method for cloud storage data based on machine learning was presented.Firstly,some small data sets were reasonably extracted from the cloud storage database,and the small data set included all natural clusters in cloud storage database.Secondly,the similarity matrix was constructed according to the definition of similarity.Thirdly,the nonlinear kernel principal component algorithm was used to measure the similarity of data in similarity matrix.Through the similarity measure,the data with the same characteristics were grouped together.Then,the mixed Gaussian distribution probability density model was used to calculate the posterior probability of different types of data.Finally,the segmental clustering of cloud storage data was achieved by comparing the probabilities.Simulation results show that the proposed method can shorten the clustering time and reduce the clustering degree to 29%,so that the smoothness of the clustering result is improved.
作者
王俊
杨茹
程显生
WANG Jun;YANG Ru;CHENG Xian-sheng(Inner Mongolia Agricultural University,Department of Computer Technology and Information Management,Baotou Inner Mongolia 014109,China)
出处
《计算机仿真》
北大核心
2020年第6期475-478,共4页
Computer Simulation
关键词
自然簇
相似度矩阵
非线性核主成分算法
混合高斯分布概率密度模型
Natural cluster
Similarity matrix
Nonlinear kernel principal component analysis
Mixed Gaussian distribution probability density model