摘要
提出一种基于半监督K-means的K值全局寻优算法,该算法打破传统方法中采用样本类别作为K值的限定,利用少量标记数据即可指导和规划大量无监督数据.结合数据集自身的分布特点及聚类后各个簇内的监督信息,根据投票方法来指导簇中数据集的类别标记.实验表明,本文所提出的方法可以有效的寻找适合数据集的最佳K值和聚类的中心,提高聚类性能.
In this paper, we propose a global optimising K value for semi-supervised K-means algorithm. It has broken the limits that traditional methods have in selecting samples as the K value. It can direct and plan a great amount of supervision data by using only a small amount of labled data. Combining the distribution characteristics of data sets and monitoring information in each cluster after clustering, we use the voting rule to guide the cluster labeling in the data sets. The experiments show that the method proposed in this paper can effectively find the best data sets for K values and clustering center and enhancing the performance of clustering.
出处
《北京交通大学学报》
CAS
CSCD
北大核心
2009年第6期106-109,共4页
JOURNAL OF BEIJING JIAOTONG UNIVERSITY
基金
国家自然基金资助项目(60773062
60873100)
河北省科技支撑计划项目资助(072135188)
河北省教育厅科研计划项目资助(2008312)