Clustering by Pattern Similarity 被引量：2

Clustering by Pattern Similarity

导出

摘要 The task of clustering is to identify classes of similar objects among a set of objects. The definition of similarity varies from one clustering model to another. However, in most of these models the concept of similarity is often based on such metrics as Manhattan distance, Euclidean distance or other Lp distances. In other words, similar objects must have close values in at least a set of dimensions. In this paper, we explore a more general type of similarity. Under the pCluster model we proposed, two objects are similar if they exhibit a coherent pattern on a subset of dimensions. The new similarity concept models a wide range of applications. For instance, in DNA microarray analysis, the expression levels of two genes may rise and fall synchronously in response to a set of environmental stimuli. Although the magnitude of their expression levels may not be close, the patterns they exhibit can be very much alike. Discovery of such clusters of genes is essential in revealing significant connections in gene regulatory networks. E-commerce applications, such as collaborative filtering, can also benefit from the new model, because it is able to capture not only the closeness of values of certain leading indicators but also the closeness of （purchasing, browsing, etc.） patterns exhibited by the customers. In addition to the novel similarity model, this paper also introduces an effective and efficient algorithm to detect such clusters, and we perform tests on several real and synthetic data sets to show its performance. The task of clustering is to identify classes of similar objects among a set of objects. The definition of similarity varies from one clustering model to another. However, in most of these models the concept of similarity is often based on such metrics as Manhattan distance, Euclidean distance or other Lp distances. In other words, similar objects must have close values in at least a set of dimensions. In this paper, we explore a more general type of similarity. Under the pCluster model we proposed, two objects are similar if they exhibit a coherent pattern on a subset of dimensions. The new similarity concept models a wide range of applications. For instance, in DNA microarray analysis, the expression levels of two genes may rise and fall synchronously in response to a set of environmental stimuli. Although the magnitude of their expression levels may not be close, the patterns they exhibit can be very much alike. Discovery of such clusters of genes is essential in revealing significant connections in gene regulatory networks. E-commerce applications, such as collaborative filtering, can also benefit from the new model, because it is able to capture not only the closeness of values of certain leading indicators but also the closeness of （purchasing, browsing, etc.） patterns exhibited by the customers. In addition to the novel similarity model, this paper also introduces an effective and efficient algorithm to detect such clusters, and we perform tests on several real and synthetic data sets to show its performance.

作者王海勋裴健

机构地区 IBM T.J.Watson Research Center Simon Fraser University

出处《Journal of Computer Science & Technology》 SCIE EI CSCD 2008年第4期481-496,共16页 计算机科学技术学报（英文版）

关键词 data mining CLUSTERING pattern similarity data mining, clustering, pattern similarity

分类号 TP301.6 [自动化与计算机技术—计算机系统结构]

引文网络
相关文献

参考文献23

1Ester M, Kriegel H, Sander J, Xu X. A density-based algorithm for discovering clusters in large spatial databases with noise. In Proc. SIGKDD, 1996, pp.226-231.
2Ng R T, Han J. Efficient and effective clustering methods for spatial data mining. In Proc. Santiago de Chile, VLDB, 1994, pp.144-155.
3Zhang T, Ramakrishnan R, Livny M. Birch: An efficient data clustering method for very large databases. In Proc. SIGMOD, 1996, pp.103-114.
4Murtagh F. A survey of recent hierarchical clustering algorithms. The Computer Journal, 1983, 26: 354-359.
5Michalski R S, Stepp R E. Learning from observation: Conceptual clustering. Machine Learning: An Artificial Intelligence Approach, Springer, 1983, pp.331-363.
6Fisher D H. Knowledge acquisition via incremental conceptual clustering. In Proc. Machine Learning, 1987.
7Fukunaga K. Introduction to Statistical Pattern Recognition. Academic Press, 1990.
8Beyer K, Goldstein J, Ramakrishnan R, Shaft U. When is nearest neighbors meaningful. In Proc. the Int. Conf. Database Theories, 1999, pp.217-235.
9Aggarwal C C, Procopiuc C, Wolf J, Yu P S, Park J S. Fast algorithms for projected clustering. In Proc. SIGMOD, Philadephia, USA, 1999, pp.61-72.
10Aggarwal C C, Yu P S. Finding generalized projected clusters in high dimensional spaces. In Proc. SIGMOD, Dallas, USA, 2000,pp.70-81.

同被引文献9

1王太雷.个性化推荐系统中相似模式聚类研究[J].计算机工程,2005,31(10):156-158. 被引量：3
2牛琨,张舒博,陈俊亮.采用属性聚类的高维子空间聚类算法[J].北京邮电大学学报,2007,30(3):1-5. 被引量：13
3罗武庭.DJ—2可变矩形电子束曝光机的DMA驱动程序[J].LSI制造与测试,1989,10(4):20-26. 被引量：373
4Agrawal R, Gehrke J, Gunopulos D, et al. Automatic Subspace Clustering of High Dimensional Data [ J ]. Data Mining and Knowledge Discovery, 2005, 11 ( 1 ) : 5-33.
5Procopiuc C M, Johes M, Agarwal P K, et al. A Monte Carlo Algorithm for Fast Projective Clustering [ C 1//Proc ACM SIGMOD Int Conf on Management of Data. Madison: ACM Press, 2002:418-427.
6Wang H,Wang W,Yang J,et al. Clustering by Pattern Similarity in Large Data Sets [ C ] // Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data. New York: ACM Press, 2002:394-405.
7Huang H, Wu X D, Relue R. Mining Frequent Patterns with the Pattern Tree [ J ]. New Generation Computing, 2005, 23 ( 4 ) : 315-337.
8许海玲,吴潇,李晓东,阎保平.互联网推荐系统比较研究[J].软件学报,2009,20(2):350-362. 被引量：546
9李春,朱珍民,高晓芳,陈援非.基于邻居决策的协同过滤推荐算法[J].计算机工程,2010,36(13):34-36. 被引量：25

引证文献2

1郭伟光,章蕾.基于用户模式聚类的协同过滤个性化推荐方法[J].情报杂志,2011,30(2):160-163. 被引量：5
2Kiatichai Treerattanapitak,Chuleerat Jaruskulchai.Exponential Fuzzy C-Means for Collaborative Filtering[J].Journal of Computer Science & Technology,2012,27(3):567-576. 被引量：5

二级引证文献10

1曾子明,王峰.移动环境下基于隐性评分的博客推荐技术[J].情报杂志,2012,31(4):117-121. 被引量：3
2Kiatichai Treerattanapitak,Chuleerat Jaruskulchai.Possibilistic Exponential Fuzzy Clustering[J].Journal of Computer Science & Technology,2013,28(2):311-321. 被引量：1
3席磊,郑光,汪强,庞晓丹,丁保华,马新明.基于个性化特征的无公害农产品目录智能服务系统[J].农业工程学报,2013,29(20):142-150. 被引量：4
4柴森.移动环境下基于情境感知的数字图书馆信息推荐研究[J].情报探索,2014(7):99-101. 被引量：4
5陈祖琴,刘喜文,郑昌兴.面向科研跟踪推送的个性化知识服务模型[J].图书馆学研究,2015(1):78-83. 被引量：12
6王召义,雷丽丽.基于改进RFM模型的协同过滤推荐算法研究[J].安阳工学院学报,2015,14(2):52-56. 被引量：3
7董晨露,柯新生.基于用户兴趣变化和评论的协同过滤算法研究[J].计算机科学,2018,45(3):213-217. 被引量：16
8刘势,屈静,蔡政英.基于Hadoop云平台的模糊聚类算法研究[J].信息通信,2018,31(2):84-86. 被引量：2
9姚曦.基于改进K-Means的大学生体质健康评价细分模型研究[J].软件导刊,2018,17(10):55-59. 被引量：2
10张瑞典,钱晓东.用余弦相似度修正评分的协同过滤推荐算法[J].计算机工程与科学,2020,42(6):1096-1105. 被引量：14

1张慧,邢培振.Delphi模拟实现控件数组分析[J].数字技术与应用,2012,30(1):103-104.
2贾迪,孟祥福,孟琭,董娜.RGB空间下结合高斯曼哈顿距离图的彩色图像边缘检测[J].电子学报,2014,42(2):257-263. 被引量：18
3黄怡然,胡晓勤.基于击键动力学的中文自由文本持续认证方法[J].计算机工程,2016,42(1):138-144. 被引量：4
4新品上市[J].软件和信息服务,2013(2):71-71.
5李季,何嘉.网上答疑辅导系统的设计与实现[J].成都信息工程学院学报,2007,22(z1):91-94.
6茹蓓,陈建彪.基于朴素贝叶斯方法的Web数据噪音分类研究[J].内江科技,2016,37(7):36-37.
7曲广平.Linux中使用单用户模式[J].网络运维与管理,2015,0(4):137-138.
8常恒.数字信息检测方法的探究[J].黑龙江科技信息,2014(27):93-93.
9常恒.数字信息检测方法的探究[J].黑龙江科技信息,2014(26):191-191.
10李健.电子商务系统的安全性分析[J].商场现代化,2006(06Z):136-136.

Journal of Computer Science & Technology

2008年第4期

浏览历史

内容加载中请稍等...

Clustering by Pattern Similarity 被引量：2

参考文献23

同被引文献9

引证文献2

二级引证文献10

相关作者

相关机构

相关主题

浏览历史