期刊文献+

一种基于Seeds集和成对约束的半监督聚类算法 被引量:7

A semi-supervised clustering algorithm based on seeds and pair-wise constraints
下载PDF
导出
摘要 半监督聚类研究如何利用少量的监督信息来提高聚类性能,目前已经成为机器学习领域的一个研究热点.现有的大多数半监督聚类方法没有综合考虑Seeds集和成对约束这两种监督信息,因而提出了一种基于Seeds集和成对约束的半监督聚类算法.该算法运用Tri-training算法扩充Seeds集,结合成对约束优化Seeds集并指导聚类过程.实验结果表明,该算法能够有效提高聚类性能. Abstract:Semi-supervised learning, a kind of application-driven machine learning method, has become one of the hot topics of artificial intelligence and pattern recognition. As the main branch of semi-supervised learning, semi- supervised clustering gives a small amount of supervision information into the search process of optimal clustering. Recently, kinds of semi-supervised clustering algorithms are proposed, such as methods based on search, methods based on similarity, methods based on search and similarity. However, most current semi-supervised clustering algorithms don't use valuable seeds and pair-wise constraints at the same time. Therefore, a semi-supervised clustering algorithm based on seeds and pair-wise constraints is introduced, in order to make full use of given supervision information. In addition, Tri-training algorithm is a representative method based on Co-training mechanism. Considering that Tri-training algorithm can use three classifiers to label unlabeled samples, the proposed algorithm will utilize it to get more labeled samples. Firstly, based on Tri-training method, some unlabeledsamples are selected and annotated, to enlarge the number of initial labeled samples. Secondly, pair wise constraints are utilized to optimize enlarged labeled samples, with the purpose of improving its quality. Thirdly, initial clustering centers are acquired by optimized labeled samples. Finally, K-Means algorithm is carried out, and in the search process, pair-wise constraints are used to modify the partitioning results each time. Furthermore the proposed algorithm is compared with K-Means, Seeded-K-Means and COP-K-Means algorithm. And experimental results on three UCI data sets in same setting demonstrate that this method can take full advantage o{ given supervision information and get a better clustering result. Moreover, the experiment in Haberman data set is conducted to analyze relative impact on the algorithm's performance of pair-wise constraints and labeled samples numbers. Experimental results illustrate that the more pair-wise constraints numbers, or the more labeled samples numbers, the better this algorithm's performance.
出处 《南京大学学报(自然科学版)》 CAS CSCD 北大核心 2012年第4期405-411,共7页 Journal of Nanjing University(Natural Science)
基金 国家自然科学基金(71031006 70971080) 国家"973"计划前期研究专项课题(2011CB311805) 高等学校博士学科点专项科研基金(20101401110002)
关键词 半监督聚类 Seeds集 成对约束 semi-supervised clustering, seeds, pair-wise constraints
  • 相关文献

参考文献17

  • 1Zhu X J. Semi supervised learning literature survey. Technical Report 1530. University of Wisconsin, Madison, 2008.
  • 2Pedrycz W. Algorithms of fuzzy clustering with partial supervision. Pattern Recognition Let- ters, 1985, 3:13-20.
  • 3Basu S, Banerjee A, Mooney R J. Active semi supervision for pair-wise constrained clustering. Proceedings of the 2004 SIAM International Conference on Data Mining, 2004, 333-344.
  • 4Demiriz A, Bennett K P, Embrechts M J. Semi-supervised clustering using genetic algo- rithms. Proceedings of Intelligent Engineering Systems through Artificial Neural Networks, New York, 1999, 809-814.
  • 5Hillel A B, Hertz T, Shental N, et al. Learn- ing distance functions using equivalence rela tions. Proceedings of the 20th International Con ference on Machine Learning, Washington, 2003, 11-18.
  • 6Xing EP, Ng AY, JordanMl, etal. Distance metric learning with application to clustering with side-information. Advances in Neural In- formation Processing Systems, 2003, 15 ; 505-512.
  • 7Xu Q J, desJardins M, Wagstaf K. Constrained spectral clustering under a local proximity struc- ture assumption. Proceedings of the 18^th Inter- national Florida Artificial Intelligence Research Society Conference, AAAI Press, 2005, 866-867.
  • 8Basu S, Bilenko A Mooney R J. A probabilistic framework for semi-supervised clustering. Pro- ceedings of the 10^th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Seattle, 2004, 59-68.
  • 9Bilenko M, Basu S, Mooney R J. Intergating constraints and metric learning in semi-super-vised clustering. Proceedings of the 21^st Interna- tional Conference on Machine Learning, New York, 2004, 81-88.
  • 10Yin X S, Chen S C, Hu E L, etal. Semi-su- pervised clustering with metric learning: An adaptive kernel method. Pattern Recognition, 2010, 43(4): 1320-1333.

二级参考文献55

  • 1杨建林.基于文献集相似度的分类方法[J].情报学报,1999,18(S1):92-94. 被引量:5
  • 2林春燕,朱东华.科学文献的模糊聚类算法[J].计算机应用,2004,24(11):66-67. 被引量:9
  • 3Basu S, Banerjee A, Mooney RJ. A probabilistic framework for semi-supervised clustering. In: Boulicaut JF, Esposito F, Giannotti F, Pedreschi D, eds. Proc. of the 10th ACM SIGKDD Int'l Conf. on Knowledge Discovery and Data Mining. New York: ACM Press, 2004.59-68.
  • 4Bilenko M, Basu S, Mooney RJ. Integrating constraints and metric learning in semi-supervised clustering. In: Brodley CE, ed. Proc. of the 21st Int'l Conf. on Machine Learning. New York: ACM Press, 2004. 81-88.
  • 5Tang W, Xiong H, Zhong S, Wu J. Enhancing semi-supervised clustering: a feature projection perspective. In: Berkhin P, Caruana R, Wu XD, eds. Proc. of the 13th ACM SIGKDD Int'l Conf. on Knowledge Discovery and Data Mining. New York: ACM Press, 2007. 707-716.
  • 6Basu S, Banerjee A, Mooney RJ. Active semi-supervision for pairwise constrained clustering. In: Jonker W, Petkovic M, eds. Proc. of the SIAM Int'l Conf. on Data Mining. Cambridge: MIT Press, 2004. 333-344.
  • 7Yan B, Domeniconi C. An adaptive kernel method for semi-supervised clustering. In: Fiirnkranz J, Scheffer T, Spiliopoulou M, eds. Proc. of the 17th European Conf. on Machine Learning. Berlin: Sigma Press, 2006. 18-22.
  • 8Yeung DY, Chang H. Extending the relevant component analysis algorithm for metric learning using both positive and negative equivalence constraints. Pattern Recognition, 2006,39(5):1007-1010.
  • 9Beyer K, Goldstein J, Ramakrishnan R, Shaft U. When is "Nearest Neighbors Meaningful"? In: Beeri C, Buneman P, eds. Proc. of the Int'l Conf. on Database Theory. New York: ACM Press, 1999.217-235.
  • 10Ding CH, Li T. Adaptive dimension reduction using discriminant analysis and K-means clustering. In: Ghahramani Z, ed. Proc. of the 19th Int'l Conf. on Machine Learning. New York: ACM Press, 2007.521-528.

共引文献78

同被引文献92

  • 1姚天昉,娄德成.汉语语句主题语义倾向分析方法的研究[J].中文信息学报,2007,21(5):73-79. 被引量:77
  • 2TSOUMAKAS G, KATAKIS I. Multi-label classification: an overview[J]. International Journal of Data Warehousing and Mining, 2007, 3(3): 1-13..
  • 3ZHU Xiaojin. Semi-supervised learning literature survey [R]. Madison, USA: University of WisconsinMadison, 2008..
  • 4ZHOU Zhihua, ZHANG Minling, HUANG Shengjun, et al. Multi-instance multi-label learning[J]. Artificial Intelligence, 2012, 176(1): 2291-2320..
  • 5ZHANG Minling, ZHANG Kun. Multi-label learning by exploiting label dependency[C]//Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Washington, DC, USA,2010: 999-1007..
  • 6BOUTELL M R, LUO Jiebo, SHEN Xipeng, et al. Learning multi-label scene classification[J]. Pattern Recognition, 2004, 37(9): 1757-1771..
  • 7FURNKRANZ J, HULLERMEIER E, MENCIA E L, et al. Multi-label classification via calibrated label ranking[J]. Machine Learning, 2008, 73(2): 133-153..
  • 8TSOUMAKAS G, VLAHAVAS I. Random k-labelsets: an ensemble method for multilabel classification[C]//Proceedings of the 18th European Conference on Machine Learning. Berlin: Springer, 2007: 406-417..
  • 9ZHANG Minling, ZHOU Zhihua. ML-kNN: a lazy learning approach to multi-label learning[J]. Pattern Recognition, 2007, 40(7): 2038-2048.
  • 10ELISSEEFF A, WESTON J. A kernel method for multi-labelled classification[M]//DIETTERICH T G, BECKER S, GHAHRAMANI Z. Advances in Neural Information Processing Systems 14. Cambridge, USA: The MIT Press, 2002: 681-687..

引证文献7

二级引证文献22

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部