期刊文献+

基于岭回归模型大数据最优子抽样算法研究 被引量:9

Optimal Subsampling Algorithm for Big Data Ridge Regression
原文传递
导出
摘要 随着大数据时代的来临,为了提高计算效率,Wang等(2018)提出基于logistic回归的最优子抽样算法,在保证参数估计精度的前提下,节省了大量的运算时间.为解决变量间的多重共线性,文章提出基于岭回归模型的最优子抽样算法,并证明岭回归模型中参数估计的一致性与渐近正态性.利用数值模拟与实证分析对最优子抽样算法进行评估,结果表明,利用最优子抽样构建的模型与全样本构建的模型在参数估计的精度相近,并大幅减少了运算时间. With the advent of the big data era,in order to improve computational efficiency,Wang,et al.(2018) proposed an optimal subsampling algorithm for logistic regression,which provides a better tradeoff between estimation efficiency and computational efficiency.To solve the problem of multicollinearity among variables,this paper proposes an optimal subsampling algorithm in the context of ridge regression,and proves the consistency and asymptotic normality of the estimator from optimal subsampling algorithm.Numerical experiments are carried out on both simulated and real data to evaluate the proposed methods.Results show that the optimal subsampling algorithm produces similar results compared with the full data analysis,while significantly reducing the computational costs.
作者 李莉莉 靳士檑 周楷贺 LI Lili;JIN Shilei;ZHOU Kaihe(School of Economics,Qingdao University,Qingdao 266100)
出处 《系统科学与数学》 CSCD 北大核心 2022年第1期50-63,共14页 Journal of Systems Science and Mathematical Sciences
基金 国家社会科学基金(2019BTJ028)资助课题。
关键词 大数据 最优子抽样算法 岭回归 Big data optimal subsampling algorithm ridge regression
  • 相关文献

参考文献3

二级参考文献26

  • 1Svensson J. Web panel surveys--can they be designed and used in a scientifically sound way? [ C ]. 59th World Statistics Congress, 2013.
  • 2Rivers D. Sample matching--representative sampling from internet panels[ J]. Polimetrix White Paper Series, 2006.
  • 3Vavreck L, Rivers D. The 2006 cooperative congressional election study [ J]. Journal of Elections, Public Opinion & Parties, 2008,18 (4) :35 -66.
  • 4Baker R, Brick J M, Bates N A, et al. Summary report of the AAPOR task force on nonprobability sampling [ J ]. Journal of Survey Statistics and Methodology, 2013, 1 (2) : 90 -143.
  • 5Terhanian G, Bremer J. A smarter way to select respondents for surveys? [ J]. International Journal of Market Research, 2012, 54 (6) :751 -780.
  • 6Kogan S M, Wejnert C, Chen Y F, et al. Respondent-driven sampling with hard-to-reach emerging adults: an introduction and case study with rural African Americans[ J]. Journal of Adolescent Research, 2011, 26(1 ): 30-60.
  • 7Qiu P Y, Yang Y, Ma X, et al. Respondent-driven sampling to recruit in-country migrant workers in China: a methodological assessment[ J]. Scandinavian Journal of Public Health, 2012, 40 : 92 - 101.
  • 8Elliott M N. Combining data from probability and non-probability samples using pseudo-weights [ J ]. Survey Practice, 2009, 2 (6) : 1 - 7.
  • 9Rosenbaum P R, Rubin D B. The centra| role of the propensity score in observational studies for causal effects [ J ]. Biometrika, 1983, 70(1): 41 -55.
  • 10Lee S. An evaluation of nonresponse and coverage errors in a web panel survey [ J ]. Social Science Computer Review, 2006, 24 (4) : 460 - 475.

共引文献41

同被引文献87

引证文献9

二级引证文献7

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部