期刊文献+

一种基于约束的中垂面相似度准则 被引量:1

A Mid-Perpendicular Hyperplane Similarity Criterion Based on Pairwise Constraints
下载PDF
导出
摘要 在数据挖掘和机器学习的基于距离的各种技术中,例如基于距离的聚类和基于距离的分类,如何度量数据间的相似性已经成为一项基础任务.对于某一具体问题,采用合适的相似性度量,会使问题得到更有效的解决.越来越多的研究表明,通过对成对约束(正约束和负约束)的充分利用,从而得到与问题相匹配的相似性度量,能够大幅度地提升算法性能.目前基于约束的相似性度量研究主要是基于约束的距离度量学习,通过对约束信息的利用,学习一个距离度量矩阵,然后再进行分类或者聚类.通过对成对约束尤其是负约束的挖掘,提出一种基于成对约束的相似性度量准则,然后将此准则应用于聚类和分类任务中,分别提出聚类和分类算法,最后在大量标准数据集上将这些算法的性能与目前流行的算法进行实验比较,并据此得出了一些经验性的启示. Measuring the similarity between data objects is one of the primary tasks for distance-based techniques in data mining and machine learning, e. g. , distance-based clustering or classification. For a certain problem, using proper similarity measurement will make it easier to be solved. Recently, more and more researches have shown that pairwise constraints can help to obtain a good similarity measurement for certain problem with significantly improved performances. Most existing works on similarity measurement with pairwise constraints are on distance metric learning, which use pairwise constraints to learn a distance matrix for subsequent classification or clustering. In this paper, inspired by the hyperplance used in nearest neighbor and support vector machine classifiers, we propose a new similarity measurement criterion called mid-perpendicular hyperplane similarity (MPHS) which can effectively learn from pairwise constraints, especially cannot- Then we apply it for clustering and classification tasks. Finally, we validate the elf proposed method by comparing it with several state-of-the-art algorithms th experiments on a number of benchmark datasets. link constraints. ectiveness of our rough extensive experiments on a number of benchmark datasets.
出处 《计算机研究与发展》 EI CSCD 北大核心 2012年第11期2283-2288,共6页 Journal of Computer Research and Development
基金 国家自然科学基金项目(60875030)
关键词 相似性度量 成对约束 距离度量学习 聚类 分类 similarity measurement pairwise constraint distance based learning clustering classification
  • 相关文献

参考文献15

  • 1Quang I. S, Bao H T. An association-based dissimilarity measure for categorical data [J]. Pattern Recognition Letters, 2005, 26(16): 2549-2557.
  • 2Kulis B, Basu S, Dhillon I, et al. Semi-supervised graph clustering: A kernel approach [C]//Proc of the 22nd Int Conf on Machine Learning. New York: ACM, 2005: 457- 464.
  • 3Li Zhenguo, Liu Jianzhuang, Tang Xiaoou. Pairwise constraint propagation by semidefinile programming for semi supervised classification[C] //Proc of the 25th Int Conf on Machine Learning. NewYork: ACM, 2008:576-583.
  • 4Ning E P, Ng Y Andrew, Jordan M I, et al. Distance metric learning, with application to clustering with side-information [C] //Proc of the 15th Conf on Advances in Neural Information Processing Systems (NIPS). Cambridge.- MIT Press, 2003~: 505-512.
  • 5Bilenko M, Basu S, Mooney R J. Integrating constraints and metric learning in semi supervised clustering[C] //Proc of the 21st Int Conf on Machine Learning. New York: ACM, 2004: 576-583.
  • 6Tang Wei, Xiong Hui, Zhong Shi, et al. Enhancing semi- supervised clustering: A feature projection perspective [C] // Proc of 13th ACM SIGKDD Int Conf on Knowledge Discovery and Data Mining. New York: ACM, 2007: 707 -716.
  • 7Hoi S, Jin R, Lyu M. Learning nonparametrie kernel matrices from pairwise constraints [C] //Proc of the 24th Int Conf on Machine Learning. New York: ACM, 2007: 361- 368.
  • 8Bar-Hillel A, Hertz T, Shental N, et al. Learning distance functions using equivalence relations [C] //Proc of the 20th Int Conf on Machine Learning. Menlo Park: AAAI, 2003: 11-18.
  • 9Cohn D, Caruana R, McCallum A. Semi supervised clustering with user feedback [R/OL]. Cornell Universit, 2003. [2012-05-08]. https://ecommons, library, cornell, edu/ bitstream/1813/5608/]/TR2003 1892. ps.
  • 10Bilenko M, Mooney R J. Adaptive duplicate detection using learnable string similarity measures [C] //Proe of the 9th ACM SIGKDD Int Conf on Knowledge Discovery and Data Mining. New York: ACM, 2003:39-48.

二级参考文献18

  • 1Ralaivola L,Swamidass S J,Saigo H,et al.Graph kernels for chemical informatics[J].Neural Networks,2005,18(8):1093-1110.
  • 2Richard D,Sean E,Anders K,et al.Biological sequence analysis:Probabilistic models of proteins and nucleic acids[M].Cambridge,UK:Cambridge University Press,1998.
  • 3Berman H M,Westbrook J,Feng Z,et al.The protein data bank[J].Nucleic Acids Research,2000,28:235-242.
  • 4Vishwanathan S V N,Schraudolph N N,Kondor R,et al.Graph kernels[J].Journal of Machine Learning Research,2009,10:1-41.
  • 5Shervashidze N,Borgwardt K.Fast subtree kernels on graphs[C] //Proceedings of International Conference on Neural Information Processing Systems,2009.
  • 6Gartner T,Flach P A,Wrobel S.On graph kernels:Hardness results and efficient alternatives[C] //Proceedings of16th Annual Conference on Computational Learning Theory and 7th Kernel Workshop(COLT).[S.l.] :Springer,2003.
  • 7Kashima H,Tsuda K,Inokuchi A.Marginalized kernels between labeled graphs[C] //Proceedings of the 20th International Conference on Machine Learning(ICML),2003.
  • 8Scholkopf B,Smola A J.Learning with kernels[M].Cambridge MA:MIT Press,2002.
  • 9Basu S,Bilenko M,Mooney R.A probabilistic framework for semi-supervised clustering[C] //Proceedings of the International Conference on Knowledge Discovery and Data Mining,Seattle,WA,2004:59-68.
  • 10Wagstaff K,Cardie C,Rogers S,et al.Constrained k-means clustering with background knowledge[C] //Proceedings of the International Conference on Data Mining,Williamstown,MA,2001:577-584.

共引文献5

同被引文献10

引证文献1

二级引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部