期刊文献+

数据挖掘中基于核的多重填补的一种新算法 被引量:1

New kernel-based multiple imputation algorithm for data mining
下载PDF
导出
摘要 在数据挖掘预处理中,数据缺失是最为常见的数据预处理问题之一。通常对所要挖掘的数据分布形式没有任何先验知识。在这种情况下,非参回归分析方法可以为数据缺失的处理提供一种效果很好的解决途径。据此,在缺失机制是随机缺失(Missing at Random,MAR)和完全随机缺失(Missing Completely at Random,MCAR)的条件下,提出了一种处理数据缺失的新方法,即基于核函数的非参多重填补算法。模拟实验结果表明,算法的置信区间的覆盖率,区间长度,以及相对效率都比常用的NORM算法要好。 In the preprocessing of data mining,data missing is one of the most common problems in data preprocessing.Quite frequently,the author have little priori knowledge about distribution of the data we want to mine.Under this condition,non-parametric regression provides an effective approach to handle the data missing.Accordingly,a new kernel-based non-parametric Multiple Imputation(MI) algorithm is proposed,under two missing mechanisms,MAR(Missing At Random) and MCAR(Missing Completely At Random).Experiments over simulation data show that our algorithm performs much better than the traditional NORM method, in the coverage of confidence interval,the interval length ,and the relative efficiency.
作者 苏毅娟
出处 《计算机工程与应用》 CSCD 北大核心 2008年第31期156-158,172,共4页 Computer Engineering and Applications
基金 广西区教育厅科学研究项目(No.0626183)
关键词 多重填补 缺失数据 核函数 非参 Multiple Imputation( MI ) missing values kernel function non-parametric
  • 相关文献

参考文献10

  • 1Cios K,Kurgan L.Trends in data mining and knowledge discovery[M]//Pal N,Jain L,Teoderesku N.Knowledge Discovery in Advanced Information Systems.[S.|.] : Springer, 2002.
  • 2Little R,Rubin D.Statistical analysis with missing data[M].New York:John Wiley & Sons Inc,1987.
  • 3Faris P.Muhiple imputation versus data enhancement for dealing with missing data in observational health care outcome analyses[J]. Journal of Clinical Epidemiology, 2002,55 : 184-191.
  • 4Taylor J,Murray S,Hsu C.Survival estimation and testing via multiple imputation[J].Statistics & Probability,2002,58:221-232.
  • 5Zhang S C.Kernel-based multi-imputation for missing data[C]//Accepted in the 4th International Conference on Active Media Technology, 2006
  • 6Little R,Rubin D.Statistical analysis with missing data[M].2nd ed.New York:John Wiley & Sons,1987.
  • 7Barnard J,Rubin D.Small-sample degrees of freedom with multiple imputation[J].Biometrika, 1999,86 : 948-955.
  • 8Schafer J.NORM:multiple imputation of incomplete multivariate data under a normal mode[EB/OL]. ( 1999).http://www.stat.psu.edu/ -jls/misoftwa.html.
  • 9Wang Q,Rao J.Empirical likelihood-based inference in linear models with missing data[J].Canadian J of Statistics,2002,29:563-576.
  • 10Blake C,Mcrz C.UCI repository of machine learning database[EB/ OL].Irvine, CA : University of California ( 1998 ).http://www.ics.uci.edu/-mlearn/M LResoesitory.html.

同被引文献4

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部