摘要
抽样方法在大数据研究中发挥着重要作用,子抽样作为其中之一可以非常高效地解决数据量大的问题,无论是线性回归模型还是Logistic回归模型都有相应的子抽样方法。本文使用大数据下基于二元Logistic模型的两种子抽样方法,分别是普通子抽样方法和两阶段最优子抽样方法,并利用实际数据评估了算法的优良性,得出以下结论:基于两阶段子抽样算法建立的Logistic回归模型在估计精度上优于基于普通子抽样建立的模型;基于L最优准则下的子抽样虽然比基于A最优准则下的子抽样估计精度略低,但耗费的运算时间更短。
Sampling methods play an important role in big data research.As one of them,subsampling is an effective way to deal with big data problems.Both linear regression models and logistic regression models have corresponding subsampling approaches.In this article,we use two subsampling approaches based on binary logistic models,which are general subsampling method and two-step optimal subsampling method.The real data is used to evaluate the superiority of the algorithm.The results are as follows:The estimation accuracy of logistic regression model based on two-step subsampling algorithm performs better than that based on general subsampling algorithm.Algorithms under L-optimality are less efficient in coefficient estimation but more efficient in terms of computing time than Algor ithms under A-optimality.
作者
韩坤凌
HAN Kun-ling(School of Mathematics and Big Data,Dezhou Unive rsity,Dezhou Shandong 253023,China)
出处
《德州学院学报》
2023年第4期1-4,共4页
Journal of Dezhou University