期刊文献+

大数据挖掘的均匀抽样设计及数值分析 被引量:9

Uniform Sampling Design and Numerical Analysis in Big Data Mining
下载PDF
导出
摘要 就大数据生成过程的多维性、稀疏性和动态性等特征而言,大数据集并不等于统计总体,即便对于静态大数据集,随机抽样同样有着不可或缺的参数估计和总体推断的方法论价值。在大型数据分析中,常常遇到需要降低维度和减少计算量但又不知如何抽样处理的问题。因此,提出均匀抽样在大数据挖掘中应用的基本策略,并使用模拟数据和医学胎心宫缩监护数据集进行数值分析。结果表明:均匀抽样在降低决策树、adaboost、bagging和随机森林的误差率上优于现有文献的常用方法,这一策略能为面向大数据的数据挖掘方法提供参考,也为针对大数据分析的抽样有效性提供佐证。 On multidimensional,sparse and dynamic characteristics of big data generation process,the big data set does not mean that the statistical population.Even for big static data,random sampling also has an indispensable value.In large-scale data analysis,it is often encounter the need to reduce the dimensions and reduce the amount of calculation and yet we do not know how to deal with the problem of sampling.Our paper proposes a uniform sampling strategy in big data mining applications,and apply simulated data and monitoring fetal heart contractions datasets to numerical analysis.Our results indicate that proposed method is obviously superior to the existing methods in literatures on the error rate of the training data.This conclusion might be useful for the implementation of data mining by sampling on the large database,and provide evidence for sampling effectiveness in big data analysis.
作者 李毅 米子川
出处 《统计与信息论坛》 CSSCI 北大核心 2015年第4期3-6,共4页 Journal of Statistics and Information
基金 国家自然科学基金项目<在家系序列数据中同质性检验的连锁研究>(31470070) 山西省自然科学基金项目<基因型模式在基因组选择中的整合研究>(2014011030-4) 山西省回国留学人员科研资助项目<基于统计学习理论的基因组选择研究>(2013-72)
关键词 均匀设计 数据挖掘 大数据抽样 uniform design data mining big data sampling
  • 相关文献

参考文献17

  • 1Jordan J M,Lin Dennis K J.Statistics for Big Data:Are Statisticians Ready for Big Data[J].International Chinese Statistical Association Bulletin,2014,26(1).
  • 2Fan J Q,Han F,Liu H.Challenges of Big Data Analysis[J].National Science Review,2014,1(12).
  • 3乔晗.“大数据”背景下利用扫描数据编制中国CPI问题研究[J].统计与信息论坛,2014,29(2):12-19. 被引量:14
  • 4Li R Z,Lin Dennis K J,Li Bing.Statistical Inference on Massive Data Sets[J].Applied Stochastic Models in Business and Industry,2013,29(5).
  • 5Fang K T,Lin Dennis K J.Winker P,Zhang Y.Uniform Design:Theory and Application[J].Technometrics,2000,42(3).
  • 6Fang K T,Lin Dennis K J.Uniform Experimental Design and Its Applications in Industry,Hndbook of Statistics in Industry[M].New York:Eisevier,2003.
  • 7Li R Z,Lin Dennis K J,Chen Y.Uniform Design:Design,Analysis and Its Application[J].International Journal of Materials and Product Technology,2004,20(1).
  • 8Huang C M,Lee Y J,Lin Dennis K J,Huang S Y.Model Selection for Support Vector Machines Via Uniform Design[J].Computational Statistics&Data Analysis,2007,52(1).
  • 9张维群.均匀设计在多指标抽样调查方案设计中的应用[J].统计与信息论坛,2009,24(10):18-23. 被引量:5
  • 10Yang J F,Sun F S,Lin Dennis K J,Liu Min-Qian.A Study on Design Uniformity Under Errors in the Level Values[J].Statistics and Probability Letters,2010,80(19).

二级参考文献17

共引文献27

同被引文献50

引证文献9

二级引证文献28

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部