期刊文献+

大数据下Logistic模型的最优子抽样算法研究 被引量:1

Research of Subsampling Al gorithm Based on Logistic Model in Big Data
下载PDF
导出
摘要 抽样方法在大数据研究中发挥着重要作用,子抽样作为其中之一可以非常高效地解决数据量大的问题,无论是线性回归模型还是Logistic回归模型都有相应的子抽样方法。本文使用大数据下基于二元Logistic模型的两种子抽样方法,分别是普通子抽样方法和两阶段最优子抽样方法,并利用实际数据评估了算法的优良性,得出以下结论:基于两阶段子抽样算法建立的Logistic回归模型在估计精度上优于基于普通子抽样建立的模型;基于L最优准则下的子抽样虽然比基于A最优准则下的子抽样估计精度略低,但耗费的运算时间更短。 Sampling methods play an important role in big data research.As one of them,subsampling is an effective way to deal with big data problems.Both linear regression models and logistic regression models have corresponding subsampling approaches.In this article,we use two subsampling approaches based on binary logistic models,which are general subsampling method and two-step optimal subsampling method.The real data is used to evaluate the superiority of the algorithm.The results are as follows:The estimation accuracy of logistic regression model based on two-step subsampling algorithm performs better than that based on general subsampling algorithm.Algorithms under L-optimality are less efficient in coefficient estimation but more efficient in terms of computing time than Algor ithms under A-optimality.
作者 韩坤凌 HAN Kun-ling(School of Mathematics and Big Data,Dezhou Unive rsity,Dezhou Shandong 253023,China)
出处 《德州学院学报》 2023年第4期1-4,共4页 Journal of Dezhou University
关键词 最优子抽样 大数据 LOGISTIC模型 optimal subsampling big sample Logistic regression
  • 相关文献

参考文献1

二级参考文献12

  • 1Doctorow C. Big data: welcome to the petacentre [ J ]. Nature News, 2008, 455(7209) : 16 -21.
  • 2Jonathan T O, Gerald A M. Special online collection: dealing with data[J]. Science, 2011, 331(6018): 639-806.
  • 3Ma P, Mahoney M W, Yu B. A statistical perspective on algorithmic leveraging[J]. Journal of Machine Learning Research, 2015, 16:861 -911.
  • 4Drineas P, Mahoney M W, Muthukrishnan S. Sampling algorithms for L2 regression and applications [ C ]. Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm. Society for Industrial and Applied Mathematics, 2006: 1127 - 1136.
  • 5Drineas P, Mahnney M W, Muthukrishnan S, et al. Faster least squares approximation [ J ] - Numerische Mathernatik, 2011, ll7 (2) : 219 -249.
  • 6Drineas P, Magdon-Ismail M, Mahoney M W, et al. Fast approximation of matrix coherence and statistical leverage [ J ]. The Journal of Machine Learning Research, 2012, 13 ( 1 ) : 3475 - 3506.
  • 7Everitt B S, Skrondal A. The Cambridge dictionary of statistics [ J]. Cambridge: Cambridge, 2002.
  • 8Giloni A, Simonoff J S, Sengupta B. Robust weighted LAD regression[ J]. Computational statistics & data analysis, 2006, 50 (11) : 3124 -3140.
  • 9Zhao W, Ma H, He Q. Parallel k-means clustering based on mapreduce [ M ]. Cloud Computing. Springer Berlin Heidelberg, 2009 : 674 - 679.
  • 10Fanaee - T H, Gama J. Event labeling combining ensemble detectors and background knowledge [ J ]. Progress in Artificial Intelligence, 2014, 2(2) : 113 - 127.

共引文献6

同被引文献2

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部