期刊文献+

基于非参数密度估计的异常点诊断方法 被引量:2

The Method of Outlier Detection Based on Nonparametric Density Estimation
原文传递
导出
摘要 异常点诊断是统计学中的经典问题.发现并减少异常点对纳税评估数据分析的影响是一项很有意义的研究.然而,通常的异常点诊断一般采用适用于单峰分布的全局识别方法.借鉴局部域相关积分(Local correlation integral)理论,提出基于非参数密度估计的识别方法.方法适用于多峰分布,能识别局域性质的异常点,对异常点占比较高的样本也有较强的识别能力.基于某市10 920个企业样本,实证分析对比研究了税务局目前使用的和建议的纳税评估方法,结果表明税务局采用的方法有较大的纳税评估风险(误判风险). Outlier detection is a classical problem in statistics. It is a very meaningful research to find and reduce the effects on analysis of outlier on tax assessment data. However, the former outlier diagnosis generally applied the global recognition method which suits for the unimodal distribution. This paper adopts the theory of local correlation integral and proposes the detection method based on nonparametric density estimation. This method suits for the multimodal distribution, can detect the local outliner, and have strong recognition ability about the sample which has the high proportion of outliner. Based on the samples of 10920 enterprises, the empirical analysis compares the tax assessment 'methods used by Tax Bureau currently and proposed by this paper, and the result shows the method used by Tax Bureau has great risk of tax assessment (the misjudged risk).
出处 《数学的实践与认识》 CSCD 北大核心 2014年第16期141-149,共9页 Mathematics in Practice and Theory
基金 国家自然科学基金(71003100) 教育部人文社会科学研究一般项目(11YJC630270) 中央高校基本科研业务费专项资金(11XNK027 10XNF020)
关键词 异常点诊断 纳税评估 非参数密度估计 局部域相关积分 outlier detection tax assessment nonparametric density estimation local correlation integral
  • 相关文献

参考文献8

  • 1Papadimitriou S, Kitagawa H, Gibbons P, et al. Loci: Fast outlier detection using the local corre- lation integral[C]// Dayal U, Ramamritham K, Vijayaraman T M. Proc of the 19th International Conference on Data Engineering (ICDE' 03). Bangalore: IEEE Computer Society, 2003: 315-326.
  • 2Samparthi V, Verma H. Outlier detection of data in wireless sensor networks using kernel density estimation[J]. International Journal of Computer Applications, 2010, 5(7): 0975-8887.
  • 3Makkonen L. Bringing closure to the plotting position controversy[J]. Communications in Statistics, 2008, 37: 460-467.
  • 4Fan J, Yao Q. Nonlinear Time Series: Nonparametric and Parametric Methods[M]. New York: Springer-Verlag, 2003.
  • 5Parzen E. On estimation of a probability density and mode[J]. Annals of Mathematical Statistics, 1962, 35: 1065-1076.
  • 6Silverman B. Kernel density estimation using the fast Fourier transform[J]. Appl. Statist., 1982, 31: 93-99.
  • 7Wand M, Jones M. Kernel Smoothing[M]. London: Chapman and Hall, 1995.
  • 8Sheather S, Jones M. A reliable data-based bandwidth selection method for kernel density estima- tion[J]. Journal of the Royal Statistical Society, Series B, 1991, 53: 683-690.

同被引文献14

引证文献2

二级引证文献13

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部