摘要
【目的】针对基于密度峰值聚类的自训练算法中错误标记样本会造成分类正确率降低,以及当已标记样本分散时密度峰值聚类算法结果易受到截断距离影响的问题,提出了结合合成实例与adaboostENN的密度峰值自训练算法。【方法】首先,用合成实例方法增加已标记样本的数量并提升空间分布的可靠性;其次,通过密度峰值聚类算法揭示数据空间结构,从而选择有代表性的无标记样本进行标记预测;最后,用集成噪声滤波器来更准确地检测出被错误标记的样本并将它删除。【结果】通过12个UCI数据集上的实验验证了所提出算法的有效性。【结论】提出的算法不仅能有效地解决无标记样本被错误标记的问题,而且使得密度峰值聚类算法不易受到截断距离的影响。
[Purposes]Aiming at the problem that the erroneously labeled samples in the self-training algorithm based on density peak clustering will reduce the classification accuracy, and the results of density peak clustering algorithm will be easily affected by the cutoff distance when the labeled data are dispersed, a density peak self-training algorithm based on synthetic examples generation and integrated noise filter is proposed. [Methods]Firstly, the synthetic examples generation method is used to increase the size of labeled data and improve the reliability of spatial distribution. Secondly, the density peak clustering algorithm is used to reveal the spatial structure of data, in order to select the representative unlabeled data for the prediction of data labels. Finally, ensemble learning of noise filters is used to detect the erroneously labeled data and delete them.[Findings]Experiments on 12 UCI datasets verify the effectiveness of the proposed algorithm. [Conclusions]The algorithm can not only effectively solve the problem that unlabeled data are erroneously labeled, but also reduce the influence of cutoff distance on density peak clustering algorithm.
作者
李帅军
吕佳
LI Shuaijun;LüJia(College of Computer and Information Sciences,Chongqing Normal University;Chongqing Digital Agriculture Service Engineering Technology Research Center,Chongqing 401331,China)
出处
《重庆师范大学学报(自然科学版)》
CAS
北大核心
2022年第4期105-113,共9页
Journal of Chongqing Normal University:Natural Science
基金
重庆市教育委员会“成渝地区双城经济圈建设”科技创新项目(No.KJCX20024)
重庆市高校创新研究群体(No.XQT20015)
重庆市研究生科研创新项目(No.CYS20241)。
关键词
半监督学习
自训练算法
密度峰值聚类
合成实例
噪声滤波器
空间结构
semi-supervised learning
self-training algorithm
density peak clustering
synthetic examples generation
noise filter
spatial structure