摘要
针对K-均值聚类不能有效应用于散点图聚类的缺陷,引入混合高斯模型,设计期望最大化的似然方程,计算数据记录属于各个聚类的似然值,正则化似然值,获得属于每个聚类的概率值,根据所属群集的最高概率值,为数据记录指定群集。研究表明,引入混合高斯模型的K-均值聚类算法在数据挖掘群集上具有更稳定的结果,通过计算期望最大化,使数据分布的参数范围与聚类数据达到最佳匹配。
In view of the deficiency that K-means clustering cannot be applied to scatter diagram clustering effectively,this paper designs likelihood equation of expectation-maximuzation,computes the likelihood value of data recording of each clustering and regularizes the likelihood value in order to acquire probability value of every clustering by using Gaussian mixture model.Clusters can be appointed to the data recording according to the maximum probability value of the cluster to which the date recording belongs.Experiments show that by using Gaussian mixture model,K-means clustering possesses stable result in data mining clustering and the clustering is obvious.
作者
何爱华
郭有强
张自军
王硕
HE Aihua;GUO Youqiang;ZHANG Zijun;WANG Shuo(Department of Computer Science and Technology,Bengbu University,Bengbu 233030,China;College of Computer Science and Technology,University of Science and Technology of China,Hefei 233000,China)
出处
《广东石油化工学院学报》
2018年第6期73-77,共5页
Journal of Guangdong University of Petrochemical Technology
基金
安徽省教育厅高校优秀中青年骨干人才国内外访学研修重点项目(gxfx ZD2016269)
关键词
K-均值聚类
混合高斯模型
期望最大化
散点图
K-means clustering
Gaussian mixture model
Expectation-maximuzation
Scatter diagram