摘要
在用混合模型聚类时,聚类数据中存在局外点是非常困难的问题。为了提高混合拟合的鲁棒性,本文用混合t模型替代混合高斯模型,来拟合含有背景噪音的多变量多高斯分布数据;提出了两个求解混合t模型的修改版期望最大化(EM)算法,并将它们与模型选择准则集成在一起,应用一个组合规则成分灭绝策略选择聚类成分数,得到两个对应的鲁棒聚类算法。对含有背景噪音的多个高斯成分进行不同聚类算法的大量实验表明,本文的鲁棒聚类算法能自动选择最佳的聚类成分数,相对于混合高斯模型的聚类方法,鲁棒性增强很多;相对于传统求解混合t模型(EM/ECM)的聚类方法,能有效避免其严重依赖初始值和易收敛至参数空间边界的缺点,具有较强的鲁棒性和较快的收敛速度。
Providing protection against outlier in clustering data is a difficult problem for mixtures models fitting. In this paper, we consider the fitting of mixtures t distributions alternative to mixtures normal distributions for multi-component gauss data with background noise, to improve the robustness of fitting. We propose two modified versions of EM algorithm and integrate them with a model selection criterion respectively, then we get two robust clustering algorithms which can avoid the drawbacks of traditional algorithms (EM/ECM) for solving mixtures t models- highly dependent on initialization and may converge to the boundary of the parameter space, and can also select the number of clusters component automatically by a combined component annihilation strategy. Experiment results show the contrast among different algorithms and demonstrate the effectiveness of our algorithms.
出处
《计算机科学》
CSCD
北大核心
2007年第5期190-193,共4页
Computer Science
基金
国家自然科学基金项目(60175001)资助
关键词
局外点
鲁棒聚类
混合t模型
期望最大化算法
模型选择准则
Outlier, Robust clustering, Mixtures t distribution, Expectation maximization, Model selection criterion