期刊文献+

大规模数据的分块SCAD惩罚回归分析 被引量:2

Block and SCAD Penalty based Regression for Large-scale Data
原文传递
导出
摘要 受到计算内存的限制,大规模数据的回归分析往往难以奏效。为此,借用“化整为零”的思想,提出了一个新的回归分析方法:分块SCAD惩罚回归。该方法核心在于:将大规模数据划分成若干个块,对每一个块进行SCAD惩罚回归,最后将每个块的参数估计结果进行简单平均作为全样本回归系数估计的近似。进一步,在理论上证明了分块SCAD惩罚回归的变量选择效果与渐近性质。数值模拟和实际应用结果表明:分块SCAD惩罚回归不仅能够显著降低计算内存的需求和计算时间,而且其变量选择、参数估计和预测结果等与全样本回归基本一致。 It is difficult to implement regression on large-scale data owing to limitations of computer primary memory. To this end, we borrow the idea of breaking up the whole into parts and propose a new regression method: Block and SCAD Penalty based Regression. The major novelty of this method includes: splitting the entire data into a few blocks, implementing the SCAD penalty regression on data in each block, deriving final results through combining these SCAD penalty regression results via simple average approach, which provides approximate estimates of the regression coefficients on entire dataset. Moreover, we demonstrate the performance of variable selection and asymptotic property of the proposed method theoretically. Both numerical simulations and a real-world application show that the proposed method significantly reduces the required amount of primary memory and computation time. In addition, the new method is as efficient as the regression on entire dataset in terms of variable selection, estimation, and prediction, etc.
作者 蔡超 许启发 蒋翠侠 王艳明 CAI Chao;XU Qi-fa;JIANG Cui-xia;WANG Yan-ming(School of Statistics,Shandong Technology and Business University,Shandong Yantai 264005,China;School of Management,Hefei University of Technology,Anhui Hefei 230009,China)
出处 《数理统计与管理》 CSSCI 北大核心 2018年第6期1023-1040,共18页 Journal of Applied Statistics and Management
基金 国家自然科学基金(71671056) 国家社会科学基金(14BTJ028,15BJY008) 教育部人文社会科学研究规划基金项目(14YJA790015) 山东省社会科学规划项目(18DTJJ01)支持
关键词 回归分析 大规模数据 分块数据 SCAD惩罚 变量选择 regression analysis large-scale data block data scad penalty variable selection
  • 相关文献

参考文献3

二级参考文献42

  • 1王继民,陈翀,彭波.大规模中文搜索引擎的用户日志分析[J].华南理工大学学报(自然科学版),2004,32(z1):1-5. 被引量:25
  • 2Tibshirani R. Regression shrinkage and selection via the lasso [J]. J. R. Statist. Soc. (B), 1996(58): 267 -288.
  • 3Bakin S. Adaptive regression and model selection in data mining problems [D]. PhD Thesis, Aus- tralian National University, Canberra, 1999.
  • 4Yuan M, Lin Y. Model selection and estimation in regression with grouped variables [J]. J. R. Statist.Soc. (B), 2006, (68): 49- 67.
  • 5Lukas Meier, Sara van de Geer, Peter Biihlmann. Tile group lasso for logistic regression [J]. J. R. Statist. Soc. (B), 2008, (70): 53 -71.
  • 6Buja A, Hastie T J, Tibshirani R J. Linear smoother and additive models (with discussion) [J]. Annals of Statistics, 1989, (17): 453- 555.
  • 7Linton O, Nielsen J P. A kernel method of estimating structured nonparametric regression based on ma.reinal integration [J]. Biometrika. 1995. (82): 93 -100.
  • 8Manyika J,Chui M,Brown B,et al.Big Data:The Next Frontier for Innovation,Competition,and Productivity [M].McKinsey Global Institute,2011.
  • 9Grobelnik,Marko.Big Data Tutorial [EB/OL].http://videolectures.net/eswc2012_grobelnik_big_data/.
  • 10Hamish Barwick.The "four Vs" of Big Data.Implementing Information Infrastructure Symposium[EB/OL].http://www.computerworld.coin.au/article/396198/iiis_four_vs_big_data/.

共引文献34

同被引文献7

引证文献2

二级引证文献3

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部