期刊文献+

分箱核密度估计的误差及其修正 被引量:1

Error Evaluation and Emendation of Binned Kernel Density Estimators
下载PDF
导出
摘要 核密度估计的计算复杂度使其难以应用于大规模数据集的密度函数构造,采用分箱近似核估计是降低密度函数构造过程复杂度的有效手段。本文提出了一种修正简单分箱核估计误差的方法,该方法采用数据重心取代分箱中心作为数据的代表点,能够更准确反映数据的局部分布特征。经证明,该方法的拟合精度为O(δ4)(相对于窗宽),达到线性分箱核估计的水平。实验表明,修正的简单分箱核估计构造方法具有良好的时间效率和计算精度,能够运用于面向大规模数据集的聚类分析应用。 The complexity of kernel density estimation (KDE) prohibits is difficult for the density function construction of the large dataset. The binning-based version of classic KDE is an efficient alternative for such kind of application. A revised simple binning strategy is presented by taking the representative gravity point of the data in a bin instead of the center of the bin. This improvement enables the simple binning strategy to monitor the distribution. It is proved that the revised simple binning can achieve 0(84) of discrepancy of the linear binning compared with the ordinary KDE. Experiments in synthetic and real world dataset show that the method has good construction efficiency and the accuracy, thus it is used in the clustering analysis of large dataset.
出处 《数据采集与处理》 CSCD 北大核心 2009年第2期212-217,共6页 Journal of Data Acquisition and Processing
基金 江苏省自然科学基金(BK20082140)资助项目 江苏省教育厅自然科学基金(06KJB520005)资助项目 江苏省"六大人才高峰"(06-E-028)资助项目
关键词 核密度估计 分箱规则 误差估计 kernel density estimation binning rule error estimation
  • 相关文献

参考文献2

二级参考文献18

  • 1Sheikholeslami G, Chatterjee S, Zhang A. Wave-Cluster: A multi-resolution clustering approach for very large spatial databases. In:Proceedings of the 24th International Conference on Very Large Databases. New York, 1998. 428~439.
  • 2Aggrawal R, Gehrke J, Gunopulos D, Raghawan P. Automatic subspace clustering of high dimensional data for data mining applications. In: Proceedings of the ACM SIGMOD International Conference on Management of Data. Seattle, WA, 1998.94~ 105.
  • 3Wang W, Yang J, Muntz R. STING: A statistical information grid approach to spatial data mining. In: Proceedings of the 23rd International Conference on Very Large Databases. Athens, Greece, 1997.186~ 195.
  • 4Hinneburg A, Keim DA. An efficient approach to clustering in large multimedia databases with noise. In: Proceedings of the International Conference on Knowledge Discovery and Data Mining (KDD'98). New York, 1998.58~65.
  • 5Xing EP, Karp RM. CLIFF: Clustering of high dimensional microarray data via iterative feature filtering using normalized cuts.BIOINFORMATICS, 2001,1(1):1~9.
  • 6Hinneburg A, Keim DA, Brandt W. Clustering 3D-structures of small amino acid chains for detecting dependences from their sequential context in proteins. In: Proceedings of the IEEE International Symposium on BioInformatics and Biomedical Engineering. Washington, DC, 2000. 43-49.
  • 7Xu X, Ester M, Kriegel H, Sander J. A distribution-based clustering algorithm for mining in large spatial databases. In: Proceedings of the 14th International Conference on Data Engineering, ICDE'98. Orlando, FL, 1998. 324~331.
  • 8Silverman B. Density Estimation for Statistics and Data Analysis. Chapman & Hall, 1986.72~113.
  • 9Han J, Kamber M. Data Mining: Concepts and Techniques. Morgan Kaufmann Publishers, 2000.335~398.
  • 10Berchtold S, Keim D, Kriegel HP. The X-tree: An index structure for high-dimensional data. In: Proceedings of the International Conference on Very Large Databases. Bombay, India, 1996.28~39.

共引文献40

同被引文献5

引证文献1

二级引证文献3

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部