Clustering in Very Large Databases Based on Distance and Density 被引量：14

导出

摘要 Clustering in very large databases or data warehouses, with many applications in areas such as spatial computation, web information collection, pattern recognition and economic analysis, is a huge task that challenges data mining researches. Current clustering methods always have the problems: 1) scanning the whole database leads to high I/O cost and expensive maintenance (e.g., R*-tree); 2) pre-specifying the uncertain parameter k, with which clustering can only be refined by trial and test many times; 3) lacking high efficiency in treating arbitrary shape under very large data set environment. In this paper, we first present a new hybrid-clustering algorithm to solve these problems. This new algorithm, which combines both distance and density strategies,can handle any arbitrary shape clusters effectively. It makes full use of statistics information in mining to reduce the time complexity greatly while keeping good clustering quality. Furthermore,this algorithm can easily eliminate noises and identify outliers. An experimental evaluation is performed on a spatial database with this method and other popular clustering algorithms (CURE and DBSCAN). The results show that our algorithm outperforms them in terms of efficiency and cost, and even gets much more speedup as the data size scales up much larger.

作者钱卫宁宫学庆周傲英

机构地区 Department of Computer Science and Engineering

出处《Journal of Computer Science & Technology》 SCIE EI CSCD 2003年第1期67-76,共10页 计算机科学技术学报（英文版）

基金国家重点基础研究发展计划(973计划)，高等学校博士学科点专项科研项目，Microsoft Research Fellowship

分类号 TP311.13 [自动化与计算机技术—计算机软件与理论] P208 [天文地球—地图制图学与地理信息工程]

引文网络
相关文献

参考文献12

1Sheikholeslami Get al. WaveCluster: A multi-resolution clustering approach for very large spatial databases. In Proc. 24th Int. Conf. Very Large Data Bases, Gupta A, Shmueli O, Widom J (eds.), New York City, Morgan Kaufmann, 1998, pp.428-438.
2Zhang T, Ramakrishnan R, Livny M. BIRCH: An efficient data clustering method for very large databases.In Proc. 1996 ACM SIGMOD International Conference on Management of Data, Jagadish H V, Mumick I S (eds.), Quebec: ACM Press, 1996, pp.103-114.
3Guha S et al. CURE: An efficient clustering algorithm for large databases. In Proc. 1998 ACM SIGMOD Int. Conf. Management of Data, Haas L M, Tiwary A (eds.), Seattle: ACM Press, 1998, pp.73-84.
4Kaufman L et al. Finding Groups in Data: An Introduction to Cluster Analysis. John Wiley & Sons, 1990.
5Ng R T, Han J. Efficient and effective clustering methods for spatial data mining. In Proc. the 20th Int. Conf.Very Large Data Bases, ( VLDB'94), Bocca J B, Jarke M, Zaniolo C (eds.), Santiago de Chile, Chile: Morgan Kaufmann, 1994, pp.144-155.
6Jain Anil K. Algorithms for Clustering Data. Prentice Hall, 1988.
7Ester Met al. A density-based algorithm for discovering clusters in large spatial databases with noises. In Proc.the 2nd International Conference on Knowled9e Discovery and Data Minin9 (KDD-96), Simoudis E, Han J, Fayyad U M (eds.), AAAI Press, 1996, pp.226-231.
8Ankerst Met al. OPTICS: Ordering points to identify the clustering structure. In Proc. 1999 ACM SIGMOD International Conference on Management of Data, Delis A, Faloutsos C, Ghandeharizadeh S (eds.),Philadelphia: ACM Press, 1999, pp.49-60.
9Agrawal R, Gehrke J, Gunopulos D et al. Automatic subspace clustering of high dimensional data for data mining applications. In Proc. 1998 ACM SIGMOD Int. Conf. Management of Data, Haas L M, Tiwary A (eds,), Seattle: ACM Press, 1998, pp.94-105.
10Wang W, Yang J, Muntz R. STING: A statistical information grid approach to spatial data mining. In Proc. 23rd International Conference on Very Large Data Bases, Jarke M, Carey M J, Dittrich K R, Lochovsky F H, Loucopoulos P, Jeusfeld M A (eds.),Athens, Greece: Morgan Kaufmann, 1997, pp.186-195.

同被引文献114

1罗毅.高校图书馆荐购系统现状与问题研究[J].图书馆学研究（应用版）,2010(12):46-49. 被引量：39
2李元臣,刘维群.基于Dijkstra算法的网络最短路径分析[J].微计算机应用,2004,25(3):295-298. 被引量：70
3冯兴杰,黄亚楼.增量式CURE聚类算法研究[J].小型微型计算机系统,2004,25(10):1847-1849. 被引量：9
4周军锋,汤显,郭景峰.一种优化的协同过滤推荐算法[J].计算机研究与发展,2004,41(10):1842-1847. 被引量：103
5刘迅芳.读者网上文献荐购方法[J].河北科技图苑,2004,17(6):54-55. 被引量：13
6崔杰,任家东.分布式关联规则挖掘中的聚类分区算法[J].计算机工程,2004,30(23):67-68. 被引量：2
7袁方,孟增辉,于戈.对k-means聚类算法的改进[J].计算机工程与应用,2004,40(36):177-178. 被引量：48
8刘高军,朱嬿.基于数据挖掘技术的建筑企业信用评价[J].中国矿业大学学报,2005,34(4):494-499. 被引量：21
9谭华.流式数据挖掘方法下的汇率行为预测方法探讨[J].湘南学院学报,2010,31(4):27-29. 被引量：1
10顾柏园,王荣本,余天洪,郭烈.基于视觉的前方车辆探测技术研究方法综述[J].公路交通科技,2005,22(10):114-119. 被引量：14

引证文献14

1沈洁,赵雷,杨季文,李榕.一种基于划分的层次聚类算法[J].计算机工程与应用,2007,43(31):175-177. 被引量：13
2冯少荣,肖文俊.DBSCAN聚类算法的研究与改进[J].中国矿业大学学报,2008,37(1):105-111. 被引量：88
3陆宇,岳昆,刘惟一.一种基于贝叶斯网的交通拥堵预测方法[J].云南大学学报（自然科学版）,2010,32(S1):355-363. 被引量：5
4关超华,陈泳丹,陈慧岩,龚建伟.基于改进DBSCAN算法的激光雷达车辆探测方法[J].北京理工大学学报,2010,30(6):732-736. 被引量：16
5武建伟,俞晓红,陈文清.基于密度的动态协同过滤图书推荐算法[J].计算机应用研究,2010,27(8):3013-3015. 被引量：12
6王晓峰,张国毅,王然.一种新的未知雷达信号快速分选方法[J].电子信息对抗技术,2011,26(5):19-22. 被引量：6
7唐小新,李高虎,唐秋鸿,曹红兵,高嵩.高校图书馆个性化电子图书荐购系统的设计和实现[J].现代图书情报技术,2012(3):83-88. 被引量：13
8冯万兴,朱晔,郭钧天,张晓庆,刘娟.基于改进的DBSCAN方法和多项式拟合的雷电短时预测[J].计算机工程与科学,2014,36(10):2028-2033. 被引量：9
9王山,冯锋,王洪伟.基于RSSI的三维空间定位算法研究[J].电脑知识与技术,2016(7):221-224. 被引量：1
10陈正兵.基于深度图像的室内三维平面分割方法研究[J].电子设计工程,2016,24(24):158-160. 被引量：4

二级引证文献183

1曾子涵.基于贝叶斯网络的交通拥堵实时预测[J].冶金管理,2019,0(21):24-24.
2宗长富,文龙,何磊.基于欧几里得聚类算法的三维激光雷达障碍物检测技术[J].吉林大学学报（工学版）,2020,50(1):107-113. 被引量：24
3毕方明,张虹,曹天杰.非均匀Hilbert曲线的生成算法[J].中国矿业大学学报,2009,38(5):729-734. 被引量：3
4徐德,谭维,杨燕,侯天子,黄乐.I-Miner环境下聚类分析算法研究与实现[J].现代计算机,2009,15(2):30-34.
5陆宇,岳昆,刘惟一.一种基于贝叶斯网的交通拥堵预测方法[J].云南大学学报（自然科学版）,2010,32(S1):355-363. 被引量：5
6赵杰,杨柳.聚类分析算法dBscan的改进与实现[J].微电子学与计算机,2009,26(11):189-192. 被引量：14
7邱萌,乔秀全,徐惠民.一种基于模糊聚类的无线传感器网络定位算法[J].武汉理工大学学报（交通科学与工程版）,2009,33(6):1203-1206. 被引量：1
8张忠林,曹志宇,李元韬.基于加权欧式距离的k_means算法研究[J].郑州大学学报（工学版）,2010,31(1):89-92. 被引量：35
9沙露,鲍培明,李尼格.基于蚁群系统的聚类算法研究[J].山东大学学报（工学版）,2010,40(3):13-18. 被引量：7
10王丹丹,付华,徐耀松.基于DBSCAN算法的煤矿瓦斯监测信息聚类分析方法研究[J].工矿自动化,2010,36(8):45-48. 被引量：2

1产品推荐[J].多媒体世界,2007(6):16-19.
2XINHUA REPORTER.Tibet Advances Amidst Challenges in 2009[J].The Journal of Human Rights,2010,9(2):37-40.
3刘德才,王鼎兴,沈美明,郑纬民.SPEEDUP指标的适用性分析[J].计算机研究与发展,1995,32(5):52-56.
4赵卫东,刘永红,鄢涛,于曦.Oracle分区表和分区索引在VLDB中的研究[J].成都大学学报（自然科学版）,2016,35(4):358-360. 被引量：3
5陈捷,徐亦方,沈复,陈志奎,王丙申.面向对象方法在SPEEDUP中的应用[J].炼油设计,1997,27(5):51-55.
6王文京.亚洲视角下的国际化[J].东方企业家,2006(6):127-127.
7LI Junbao YU Longjiang SUN Shenghe.Refined Kernel Principal Component Analysis Based Feature Extraction[J].Chinese Journal of Electronics,2011,20(3):467-470. 被引量：4
8张成国,章晓中.La1-xCaxMnO3(x≤1/3)中Ca掺杂的团簇化及其稳定性[J].物理学报,2008,57(11):7126-7131. 被引量：1
9姚剑波,张涛.抗侧信道攻击的椭圆曲线密码算法[J].计算机应用与软件,2013,30(5):203-205. 被引量：2
10Chen Delai,Xu Hong,Zhou Ying and Zhang Defu(Department of Computer Science and Technology Nailing University, Naming 210093, P. R. China).Experimental Evaluation of Objective Functions for Well-balanced Mapping[J].Wuhan University Journal of Natural Sciences,1996,1(Z1):312-316.

Journal of Computer Science & Technology

2003年第1期

浏览历史

内容加载中请稍等...

Clustering in Very Large Databases Based on Distance and Density 被引量：14

参考文献12

同被引文献114

引证文献14

二级引证文献183

相关作者

相关机构

相关主题

浏览历史