摘要
经典的APCKmeans(active pairwise constrained K-means)算法通过主动学习的方式构造must-link约束集和cannot-link约束集作为监督信息进行半监督聚类,提高了结果的准确性.但该算法在样本指派的过程中可能出现指派不是当前最优的问题.提出一种优先指派标签样本的方法,应用于APCKmeans算法,使用改进后的APCKmeans_I算法实现了使用较少的监督信息取得更好的聚类结果.将改进策略应用于PCKmeans(pairwise constrained K-means)算法,提出改进后的PCKmeans_I算法.通过在UCI基准数据集的实验表明,改进后算法的性能得到明显提升.
The classic APCKmeans (active pairwise constrained K-means )algorithm constructs the must-link constraint set and the cannot-link constraint set as the supervised information by Semi-Supervised Clustering through the active learning method to improve the accuracy of the results. However, the algorithm may not be assigned to the current optimal problem during the sample assignment process. This paper proposes a method of assigning label samples to APCKmeans algorithm, and proposes an improved APCKmeans_I algorithm to aehieve better clustering results with less supervisory information. The improved strategy is applied to PCKmeans(pairwise constrained K-means)algorithm, and PCKmeans_I algorithm is proposed. Experiments on the UCI reference data set show that the performance of the improved algorithm is obviously improved.
作者
吕峰
柴变芳
李文斌
王垚
Lü Feng;Chai Bianfang;Li Wenbin;Wang Yao(School of Information Engineering,Hebei GEO University,Shijiazhuang 050031,China)
出处
《南京师范大学学报(工程技术版)》
CAS
2018年第2期56-62,共7页
Journal of Nanjing Normal University(Engineering and Technology Edition)
基金
国家自然科学基金(61503260)
河北省研究生创新资助项目(CXZZSS2017131)
河北地质大学教改项目(2017J04)
关键词
主动半监督聚类
成对约束聚类
改进算法
active semi-supervised clustering
pairwise constrained clustering
improved algorithm