摘要
In order to implement the robust cluster analysis,solve the problem that the outliers in the data will have a serious disturbance to the probability density parameter estimation,and therefore affect the accuracy of clustering,a robust cluster analysis method is proposed which is based on the diversity self-paced t-mixture model.This model firstly adopts the t-distribution as the submodel which tail is easily controllable.On this basis,it utilizes the entropy penalty expectation conditional maximal algorithm as a pre-clustering step to estimate the initial parameters.After that,this model introduces l2,1-norm as a self-paced regularization term and developes a new ECM optimization algorithm,in order to select high confidence samples from each component in training.Finally,experimental results on several real-world datasets in different noise environments show that the diversity self-paced t-mixture model outperforms the state-of-the-art clustering methods.It provides significant guidance for the construction of the robust mixture distribution model.
基金
Supported by the 13th 5-Year National Science and Technology Supporting Project(2018YFC2000302)。