摘要
针对半监督聚类算法中监督信息使用不充分,监督信息中信息含有量低的问题,提出一种结合主动学习的半监督聚类算法.首先结合使用数据的类别标记和成对约束信息,指导Kmeans聚类过程,设计出一种基于Seeds集和成对约束的半监督聚类算法SC-Kmeans;其次将主动学习算法引入到SC-Kmeans中,以尽量小的代价选取信息含有量更高的监督信息,提高SC-Kmeans算法的聚类精度;最后在UCI标准数据集上进行仿真实验.实验结果表明,该算法取得了较好的聚类效果,有效提高了聚类准确率.
Aiming at the problem that the supervised information was not sufficient and the information content of supervision information was low in semi-supervised clustering algorithm, we proposed a semi-supervised clustering algorithm based on active learning. Firstly, we designed a semi- supervised clustering algorithm based on Seeds set and pairwise constraints (SC-Kmeans) to guide the clustering process of the Kmeans algorithm by using the labeled data and pairwise constraints. Secondly, we introduced the active learning algorithm into SC-Kmeans, in order to select a higher amount of supervision information with a small cost and improve the clustering accuracy of SC-Kmeans algorithm. Finally, the simulation experiments were performed on machine learning repository (UCI) standard data sets. The experimental results show that the proposed algorithm can achieve better clustering effect, and effectively improve the clustering accuracy.
出处
《吉林大学学报(理学版)》
CAS
CSCD
北大核心
2017年第3期664-672,共9页
Journal of Jilin University:Science Edition
基金
国家自然科学基金重点项目(批准号:61133011)
吉林省科技发展计划重点科技攻关项目(批准号:20150204005GX)
长春市科技计划重大科技攻关项目(批准号:14KG082)