摘要
在半监督聚类算法中,通常利用有标签样本的指导来提高数据的聚类效果,但不同样本对聚类结果的重要性并未充分考虑。为了解决这一问题,该文提出了一种基于自步学习的自适应半监督聚类算法(ASSCSPL)。首先,在模型中引入自适应损失函数,可以通过调节自适应损失参数提高模型的鲁棒性;其次,在模型中引入自步学习机制,用来刻画不同样本对聚类结果的不同重要程度;最后,在标签传播阶段,所得算法能够很好地利用已有的监督信息,为无标签数据赋予相应的标签权重。数据实验表明,与现有优秀算法比较,所提算法可以达到更好的聚类效果。此外,实验结果也表明,所提算法能够有效地降低噪声对模型聚类性能的影响。
In semi-supervised clustering algorithms,labeled samples are usually used to improve the clustering effect of data,but the importance of different samples to clustering results is not fully considered.To solve the problem,this paper proposes an adaptive semi-supervised clustering algorithm based on self-paced learning(ASSCSPL).Firstly,the adaptive loss function is introduced into the model to improve the robustness of the model by adjusting the adaptive loss parameters.Secondly,a self-paced learning mechanism is introduced into the model to describe the importance of different samples to the clustering results.Finally,in the stage of label propagation,the algorithm can make good use of the existing supervised information and assign corresponding label weights to the unlabeled data.Data experiments show that the proposed algorithm can achieve better clustering effect than existing excellent algorithms.In addition,experimental results also show that the proposed algorithm can effectively reduce the impact of noise on model clustering performance.
作者
贾乐瑶
马盈仓
邢志伟
蒙莹莹
JIA Leyao;MA Yingcang;XING Zhiwei;MENG Yingying(School of Science,Xi′an Polytechnic University,Xi′an 710048,China)
出处
《西北大学学报(自然科学版)》
CAS
CSCD
北大核心
2022年第5期847-856,共10页
Journal of Northwest University(Natural Science Edition)
基金
国家自然科学基金(61976130)
陕西省重点研发计划项目(2018KW-021)
陕西省自然科学基金(2022KRM170)。
关键词
半监督
谱聚类
自步学习
自适应损失
标签传播
semi-supervised
spectral clustering
self-paced learning
adaptive loss
label propagation