期刊文献+

Clustering by Pattern Similarity 被引量:2

Clustering by Pattern Similarity
原文传递
导出
摘要 The task of clustering is to identify classes of similar objects among a set of objects. The definition of similarity varies from one clustering model to another. However, in most of these models the concept of similarity is often based on such metrics as Manhattan distance, Euclidean distance or other Lp distances. In other words, similar objects must have close values in at least a set of dimensions. In this paper, we explore a more general type of similarity. Under the pCluster model we proposed, two objects are similar if they exhibit a coherent pattern on a subset of dimensions. The new similarity concept models a wide range of applications. For instance, in DNA microarray analysis, the expression levels of two genes may rise and fall synchronously in response to a set of environmental stimuli. Although the magnitude of their expression levels may not be close, the patterns they exhibit can be very much alike. Discovery of such clusters of genes is essential in revealing significant connections in gene regulatory networks. E-commerce applications, such as collaborative filtering, can also benefit from the new model, because it is able to capture not only the closeness of values of certain leading indicators but also the closeness of (purchasing, browsing, etc.) patterns exhibited by the customers. In addition to the novel similarity model, this paper also introduces an effective and efficient algorithm to detect such clusters, and we perform tests on several real and synthetic data sets to show its performance. The task of clustering is to identify classes of similar objects among a set of objects. The definition of similarity varies from one clustering model to another. However, in most of these models the concept of similarity is often based on such metrics as Manhattan distance, Euclidean distance or other Lp distances. In other words, similar objects must have close values in at least a set of dimensions. In this paper, we explore a more general type of similarity. Under the pCluster model we proposed, two objects are similar if they exhibit a coherent pattern on a subset of dimensions. The new similarity concept models a wide range of applications. For instance, in DNA microarray analysis, the expression levels of two genes may rise and fall synchronously in response to a set of environmental stimuli. Although the magnitude of their expression levels may not be close, the patterns they exhibit can be very much alike. Discovery of such clusters of genes is essential in revealing significant connections in gene regulatory networks. E-commerce applications, such as collaborative filtering, can also benefit from the new model, because it is able to capture not only the closeness of values of certain leading indicators but also the closeness of (purchasing, browsing, etc.) patterns exhibited by the customers. In addition to the novel similarity model, this paper also introduces an effective and efficient algorithm to detect such clusters, and we perform tests on several real and synthetic data sets to show its performance.
作者 王海勋 裴健
出处 《Journal of Computer Science & Technology》 SCIE EI CSCD 2008年第4期481-496,共16页 计算机科学技术学报(英文版)
关键词 data mining CLUSTERING pattern similarity data mining, clustering, pattern similarity
  • 相关文献

参考文献23

  • 1Ester M, Kriegel H, Sander J, Xu X. A density-based algorithm for discovering clusters in large spatial databases with noise. In Proc. SIGKDD, 1996, pp.226-231.
  • 2Ng R T, Han J. Efficient and effective clustering methods for spatial data mining. In Proc. Santiago de Chile, VLDB, 1994, pp.144-155.
  • 3Zhang T, Ramakrishnan R, Livny M. Birch: An efficient data clustering method for very large databases. In Proc. SIGMOD, 1996, pp.103-114.
  • 4Murtagh F. A survey of recent hierarchical clustering algorithms. The Computer Journal, 1983, 26: 354-359.
  • 5Michalski R S, Stepp R E. Learning from observation: Conceptual clustering. Machine Learning: An Artificial Intelligence Approach, Springer, 1983, pp.331-363.
  • 6Fisher D H. Knowledge acquisition via incremental conceptual clustering. In Proc. Machine Learning, 1987.
  • 7Fukunaga K. Introduction to Statistical Pattern Recognition. Academic Press, 1990.
  • 8Beyer K, Goldstein J, Ramakrishnan R, Shaft U. When is nearest neighbors meaningful. In Proc. the Int. Conf. Database Theories, 1999, pp.217-235.
  • 9Aggarwal C C, Procopiuc C, Wolf J, Yu P S, Park J S. Fast algorithms for projected clustering. In Proc. SIGMOD, Philadephia, USA, 1999, pp.61-72.
  • 10Aggarwal C C, Yu P S. Finding generalized projected clusters in high dimensional spaces. In Proc. SIGMOD, Dallas, USA, 2000,pp.70-81.

同被引文献9

引证文献2

二级引证文献10

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部