Histogram-kernel Error and Its Application for Bin Width Selection in Histograms 被引量：1

Histogram-kernel Error and Its Application for Bin Width Selection in Histograms

导出

摘要 Histogram and kernel estimators are usually regarded as the two main classical data-based nonparametric tools to estimate the underlying density functions for some given data sets. In this paper we will integrate them and define a histogram-kernel error based on the integrated square error between histogram and binned kernel density estimator, and then exploit its asymptotic properties. 3ust as indicated in this paper, the histogram-kernel error only depends on the choice of bin width and the data for the given prior kernel densities. The asymptotic optimal bin width is derived by minimizing the mean histogram-kernel error. By comparing with Scott＇s optimal bin width formula for a histogram, a new method is proposed to construct the data-based histogram without knowledge of the underlying density function. Monte Carlo study is used to verify the usefulness of our method for different kinds of density functions and sample sizes. Histogram and kernel estimators are usually regarded as the two main classical data-based nonparametric tools to estimate the underlying density functions for some given data sets. In this paper we will integrate them and define a histogram-kernel error based on the integrated square error between histogram and binned kernel density estimator, and then exploit its asymptotic properties. 3ust as indicated in this paper, the histogram-kernel error only depends on the choice of bin width and the data for the given prior kernel densities. The asymptotic optimal bin width is derived by minimizing the mean histogram-kernel error. By comparing with Scott＇s optimal bin width formula for a histogram, a new method is proposed to construct the data-based histogram without knowledge of the underlying density function. Monte Carlo study is used to verify the usefulness of our method for different kinds of density functions and sample sizes.

作者 Xiu-xiang Wang Jian-fang Zhang

机构地区 Department of Mathematics Coltege of Management

出处《Acta Mathematicae Applicatae Sinica》 SCIE CSCD 2012年第3期607-624,共18页 应用数学学报（英文版）

基金 Supported by the National Natural Science Foundation of China (No. 70371018, 70572074)

关键词 HISTOGRAM binned kernel density estimator bin width histogram-kernel error integrated square error Histogram binned kernel density estimator bin width histogram-kernel error integrated square error

分类号 O174.41 [理学—基础数学] TP391.41 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献21

1Beer, C.F., Swanepoel, J.W.H. Simple and effective number-of-bins circumference selectors for a histogram. Statistics and Computing, 9:27-35 (1999).
2Bowman, A.W. An alternative method of cross-validation for the smoothing of density estimates. Biometrika, 71:353-360 (1984).
3Cencov, N.N. Estimation of an unknown distribution density from observations. Soviet Math., 3:1159-1562 (1962).
4Daly, J.E. The construction of optimal histogram. Commun. Statist. Theory Meth., 17(9): 2921-2931 (1988).
5Devroye, L. The double kernel method in density estimation, Annales de L'Institut Henri Poincare, 25: 533-580 (1989).
6Faraway, J.J., Jhun, M. Bootstrap choice of bandwidth for density estimation. Journal of Statistical Planning and Inference, 85:1119-1122 (1990).
7Freedman, D., Diaconis, P. On the histogram as a density estimation: L2 theory. Zeitschrift fur Wahrschein- lichkeitstheorie und verwandte Gebiete, 57:453-476 (1981).
8He, K., Meeden, G. Selecting the number of bins in a histogram: A decision theoretical approach. Journal of Statistical Planning and Inference, 61:49-59 (1997).
9Parzen, E. Nonparametric statistical data modeling (with discussion). Journal of the American Statistical Association, 74:105-131 (1979).
10Rosenblatt, M. Remarks on some nonparametric estimates of a density function. Annals of Mathematical Statistics, 27:832-837 (1956).

同被引文献27

1何贤芒,王晓阳,陈华辉,董一鸿.差分隐私保护参数ε的选取研究[J].通信学报,2015,36(12):124-130. 被引量：16
2吴英杰,陈鸿,王一蕾,孙岚.面向任意区间树结构的差分隐私直方图发布算法[J].模式识别与人工智能,2015,28(12):1084-1092. 被引量：5
3张啸剑,孟小峰.基于差分隐私的流式直方图发布方法[J].软件学报,2016,27(2):381-393. 被引量：23
4康健,吴英杰,黄泗勇,陈鸿,孙岚.异方差加噪下的差分隐私直方图发布算法[J].计算机科学与探索,2016,10(6):786-798. 被引量：6
5李丽,张琳,王汝传.基于动态区间树的差分隐私数据发布算法[J].南京邮电大学学报（自然科学版）,2017,37(4):103-112. 被引量：2
6杨庚,夏春婷,白云璐.面向实时数据流的差分隐私直方图发布技术[J].南京邮电大学学报（自然科学版）,2018,38(2):69-77. 被引量：8
7吴英杰,张立群,康健,王一蕾.差分隐私流数据自适应发布算法[J].计算机研究与发展,2017,54(12):2772-2784. 被引量：1
8葛晨,吴英杰,孙岚.差分隐私流数据实时发布方法[J].计算机科学与探索,2018,12(11):1748-1757. 被引量：1
9徐文涛,李林森,钮佳超,张凌轩.一种基于桶重构的差分隐私直方图发布方法[J].通信技术,2019,52(2):409-417. 被引量：1
10张浩铭,刘田天,龙士工.优化结构下的差分隐私直方图发布[J].计算机仿真,2019,36(3):220-224. 被引量：3

引证文献1

1陈学斌,单丽洋,郭如敏.基于差分隐私的直方图发布方法综述[J].计算机应用,2024,44(10):3114-3121.

1卢江.一类混合过程的Data-Based密度估计的L_1-模强相合性[J].系统科学与数学,1991,11(2):97-106.
2陈海峰,何铁军,陈维南.由2D点对应进行刚体运动参数估计的鲁棒算法[J].信息与控制,1999,28(5):339-344.
3张建方,王秀祥.直方图理论与最优直方图制作[J].应用概率统计,2009,25(2):201-214. 被引量：26
4宋睿卓,魏庆来.Chaotic system optimal tracking using data-based synchronous method with unknown dynamics and disturbances[J].Chinese Physics B,2017,26(3):268-275.
5周彤,洪炳镕,朴松昊,周洪玉.基于蒙特卡罗学习的多机器人自组织协作[J].计算机工程与应用,2007,43(30):23-25.
6黄曼绮,钟乐海.基于XJ开发包的XML优化处理[J].兵工自动化,2006,25(12):88-89.
7YU Ming.A Nonparametric Adaptive CUSUM Method and Its Application in Source-End Defense against SYN Flooding Attacks[J].Wuhan University Journal of Natural Sciences,2011,16(5):414-418.
8柴伟,孙先仿.一种非线性系统集员辨识算法[J].北京航空航天大学学报,2005,31(11):1237-1240. 被引量：3
9谌先敢,刘娟,高智勇,刘海华.基于累积边缘图像的现实人体动作识别[J].自动化学报,2012,38(8):1380-1384. 被引量：15
10HAN Feng-yan QIN Zheng WANG Xin.A Censored Nonparametric Software Reliability Model[J].International Journal of Plant Engineering and Management,2006,11(4):227-233. 被引量：2

Acta Mathematicae Applicatae Sinica

2012年第3期

浏览历史

内容加载中请稍等...

Histogram-kernel Error and Its Application for Bin Width Selection in Histograms 被引量：1

参考文献21

同被引文献27

引证文献1

相关作者

相关机构

相关主题

浏览历史