期刊文献+

基于双向选择调整策略的半监督聚类算法

Semi-supervised Clustering Algorithm Based on Double Adjustable Strategy
下载PDF
导出
摘要 半监督聚类算法通常利用标注数据优化类别描述参数(如类的中心),然后通过类别描述参数划分无标注数据的类别,但是没有考虑标注数据对其周围无标注数据的类别划分的直接作用。文中提出一种双向选择调整策略,在根据类别描述参数对数据进行类别划分之后,利用标注数据调整其周围未标注数据的类别标签,从而提高类别划分的准确度。该方法根据标注数据周围的数据密度来动态确定数据调整范围,并采用新的相似度计算方法提高被调整的数据准确度。文中利用双向选择调整策略改进了基于多项式模型的半监督聚类算法和半监督模糊聚类算法,并使用多个标准数据集进行实验。实验结果表明改进的算法有效提高了半监督聚类的准确性。 Usually, semi-supervised clustering algorithms utilize a small amount of labeled data to improve cluster parameters which guide the clustering of unlabeled data. However, the existing semi-supervised clustering algorithms ( such as cluster centroid) ignore the labeled data could directly affect the clustering of unlabeled data. It proposes a double adjustment strategy which adjusts unlabeled data clustering with the labeled information, after the data is clustered according to the cluster parameters. Thus, the proposed method improves the cluste- ring accuracy. The adjustment extension is changed dynamically by the local density around the labeled^data. And a novel similarity meas- ure is proposed to improve the accuracy of the adjusted unlabeled data. It modifies two algorithms,based on mulfinomial model semi-su- pervised clustering algorithm and semi-supervised fuzzy clustering algorithm, with the double adjustment method. Experimental results show that the method could improve the accuracy of semi-supervised clustering.
出处 《计算机技术与发展》 2013年第2期1-6,10,共7页 Computer Technology and Development
基金 国家自然科学基金重点项目(71031002) 国家自然科学基金资助项目(70871016)
关键词 半监督聚类 未标注数据 标注数据 相似度 多项式模型 模糊聚类 semi-supervised clustering unlabeled data labeled data similarity multinomial model fuzzy clustering
  • 相关文献

参考文献11

  • 1Frey B J,Dueck D. Clustering by Passing Messages between Data Points[J].{H}SCIENCE,2007,(5814):972-976.
  • 2Basu S. Semi-supervised Clustering:Probabilistic Models,Al-gorithms and Experiments[D].Knoxiville:University of Texas at Austin,2005.
  • 3Basu S,Banerjee A,Mooney R. Semi-supervised Clustering by Seeding[A].Sydney:ACM Inc,2002.19-26.
  • 4Dang Yanzhong,Xuan Zhaoguo,Rong Lili. A Novel Ini-tialization Method for Semi-supervised Clustering[A].Belfast:Springer Publisher,2010.317-328.
  • 5张敏,于剑.基于划分的模糊聚类算法[J].软件学报,2004,15(6):858-868. 被引量:176
  • 6Shi Zhong. Semi-supervised Model-based Document Cluste-ring:A Comparative Study[J].{H}Machine Learning,2006,(01):3-29.
  • 7Basu S,Bilenko M,Mooney R J. A Probabilistic Framework for Semi-supervised Clustering[A].Seattle:ACM Inc,2004.59-68.
  • 8Shi Zhong,Ghosh J. A Unified Framework for Model-based Clustering[J].Journal of Machine Learning Research (JM-LR),2003,(11):1001-1037.
  • 9肖宇,于剑.基于近邻传播算法的半监督聚类[J].软件学报,2008,19(11):2803-2813. 被引量:165
  • 10Li Kunlun,Cao Zheng,Cao Liping. A Novel Semi-super-vised Fuzzy C-Means Clustering Method[A].Hebei:IEEE Press,2009.3761-3765.

二级参考文献5

共引文献337

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部