期刊文献+

密度偏差抽样技术在聚类算法中的应用研究 被引量:7

Applied Research on Clustering Algorithm Using Density Biased Sampling Technology
下载PDF
导出
摘要 针对在大规模数据集上进行聚类困难的问题,分析了抽样技术的优点,研究了数据挖掘领域中的随机抽样的特点,并在此基础上提出了一种基于密度的偏差抽样方法。利用密度偏差抽样所获得的样本数据集能够较准确地反映总体数据集的特征,并且能够灵活地控制对数据集不同区域的抽样率。实验证明,在大规模数据集上进行聚类时,密度偏差抽样在时间复杂度上要优于随机抽样。 The advantages of sampling technology were analyzed against the difficulties of clustering on large-scale data set, and study the traits of random sampling in data mining were studied then a biased sampling method based on density was presented. The sample data set using density biased sampling can more accurately reflect the character of the whole data set,and biased sampling can control the sampling rate freely as to different part of the data set. The experimental results show that, density biased sampling is superior to random sampling in time complexity when clustering on large-scale data set.
出处 《计算机科学》 CSCD 北大核心 2009年第2期207-209,264,共4页 Computer Science
基金 国家自然科学基金重点资助项目(70031010) 985哲学社会科学创新基地建设研究论文之一 "新世纪优秀人才支持计划"资助
关键词 数据挖掘 聚类 偏差抽样 随机抽样 Data mining, Clustering, Biased sampling, Random sampling
  • 相关文献

参考文献11

  • 1Toivonen H. Sampling large databases from association rulesff VLDB'96. 1996
  • 2Chen B, Haas P, Scheuermann P. New Two - phase Sampling - based Algorithm for Discovering Association Rules//SIGKDD'02. 2002
  • 3张春阳,周继恩,钱权,蔡庆生.抽样在数据挖掘中的应用研究[J].计算机科学,2004,31(2):126-128. 被引量:11
  • 4Olken F, Rotem D, Xu Ping. Random sampling from hash files// Proceedings of the 1990 ACM SIGMOD International Conference on Management of Data, ACM Press, 1990:375-386
  • 5Guha S,Rastogi R,Shim K. CURE: An Efficient Clustering Algorithm for Large Databases//Proc. ACM SIGMOD Conf.June 1998 : 73-84
  • 6Knorr E, Ng R. A unified notion of outliers:Properties and computation//Proc. 1997 Int. Conf. Knowledge Discovery and Data Mining(KDD'97). Newport Beach,CA. Aug. 1997:219 -222
  • 7Motwani R, Raghavan P. Randomizeed Algorithms. Cambridge University Press, 1995
  • 8Poosala V,Ioannidis Y. Selectivity Estimation Without the Attribute Value Independence Assumption//Proc. Very Large Data Bases Conf.. Aug. 1997:486-495
  • 9Blohsfeld B,Korus D, Seeger B. A Comparison of Selectivity Estimators for Range Queries on Metric Attributes//Proc. ACM SIGMOD Int'l Conf. Management of Data. 1999
  • 10Scott D. Multivariate Density Estimation: Theory, Practice and Visualization. Wiley& Sons, 1992

二级参考文献4

共引文献72

同被引文献72

引证文献7

二级引证文献31

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部