大数据下Logistic模型的最优子抽样算法研究被引量：1

Research of Subsampling Al gorithm Based on Logistic Model in Big Data

下载PDF

导出

摘要抽样方法在大数据研究中发挥着重要作用,子抽样作为其中之一可以非常高效地解决数据量大的问题,无论是线性回归模型还是Logistic回归模型都有相应的子抽样方法。本文使用大数据下基于二元Logistic模型的两种子抽样方法,分别是普通子抽样方法和两阶段最优子抽样方法,并利用实际数据评估了算法的优良性,得出以下结论:基于两阶段子抽样算法建立的Logistic回归模型在估计精度上优于基于普通子抽样建立的模型;基于L最优准则下的子抽样虽然比基于A最优准则下的子抽样估计精度略低,但耗费的运算时间更短。 Sampling methods play an important role in big data research.As one of them,subsampling is an effective way to deal with big data problems.Both linear regression models and logistic regression models have corresponding subsampling approaches.In this article,we use two subsampling approaches based on binary logistic models,which are general subsampling method and two-step optimal subsampling method.The real data is used to evaluate the superiority of the algorithm.The results are as follows:The estimation accuracy of logistic regression model based on two-step subsampling algorithm performs better than that based on general subsampling algorithm.Algorithms under L-optimality are less efficient in coefficient estimation but more efficient in terms of computing time than Algor ithms under A-optimality.

作者韩坤凌 HAN Kun-ling(School of Mathematics and Big Data,Dezhou Unive rsity,Dezhou Shandong 253023,China)

机构地区德州学院数学与大数据学院

出处《德州学院学报》 2023年第4期1-4,共4页 Journal of Dezhou University

关键词最优子抽样大数据 LOGISTIC模型 optimal subsampling big sample Logistic regression

分类号 F224 [经济管理—国民经济]

引文网络
相关文献

参考文献1

1秦磊,熊巍,田茂再.大数据下Leverage重要性抽样方法的稳健改进[J].统计研究,2016,33(8):101-105. 被引量：7

二级参考文献12

1Doctorow C. Big data: welcome to the petacentre [ J ]. Nature News, 2008, 455(7209) : 16 -21.
2Jonathan T O, Gerald A M. Special online collection: dealing with data[J]. Science, 2011, 331(6018): 639-806.
3Ma P, Mahoney M W, Yu B. A statistical perspective on algorithmic leveraging[J]. Journal of Machine Learning Research, 2015, 16:861 -911.
4Drineas P, Mahoney M W, Muthukrishnan S. Sampling algorithms for L2 regression and applications [ C ]. Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm. Society for Industrial and Applied Mathematics, 2006: 1127 - 1136.
5Drineas P, Mahnney M W, Muthukrishnan S, et al. Faster least squares approximation [ J ] - Numerische Mathernatik, 2011, ll7 (2) : 219 -249.
6Drineas P, Magdon-Ismail M, Mahoney M W, et al. Fast approximation of matrix coherence and statistical leverage [ J ]. The Journal of Machine Learning Research, 2012, 13 ( 1 ) : 3475 - 3506.
7Everitt B S, Skrondal A. The Cambridge dictionary of statistics [ J]. Cambridge: Cambridge, 2002.
8Giloni A, Simonoff J S, Sengupta B. Robust weighted LAD regression[ J]. Computational statistics & data analysis, 2006, 50 (11) : 3124 -3140.
9Zhao W, Ma H, He Q. Parallel k-means clustering based on mapreduce [ M ]. Cloud Computing. Springer Berlin Heidelberg, 2009 : 674 - 679.
10Fanaee - T H, Gama J. Event labeling combining ensemble detectors and background knowledge [ J ]. Progress in Artificial Intelligence, 2014, 2(2) : 113 - 127.

共引文献6

1陈光慧,刘建平.构建新时代现代化统计调查体系的问题研究[J].统计研究,2018,35(6):11-17. 被引量：10
2任建龙,赵巧娥,严志伟,高金城.大数据下基于IPSO优化模糊PSR-KELM模型预测风功率[J].自动化与仪表,2019,34(8):77-81.
3李晨露.大数据下广义线性模型的参数估计算法[J].系统科学与数学,2020,40(5):927-940. 被引量：2
4贺建风,张莉维.异常点诊断视角下的大数据杠杆抽样方法[J].统计学报,2022,3(3):70-85. 被引量：4
5李莉莉,杜梅慧,张璇.基于logistic回归模型的大数据分布式两步子抽样算法[J].数理统计与管理,2022,41(5):858-866. 被引量：4
6韩潇,王明秋,赵胜利.基于稳健距离的大数据Logistic回归最优子抽样[J].统计与决策,2024,40(15):59-64.

同被引文献2

1袁晓惠,郭伟,王纯杰.大数据分位数回归下基于信息阵的最优子抽样[J].东北师大学报（自然科学版）,2023,55(3):30-36. 被引量：2
2熊正榆,吴刘仓,杨兰军.异方差大数据下联合均值与方差模型的α-最优子抽样[J].系统科学与数学,2024,44(7):2146-2172. 被引量：2

引证文献1

1孙涛,王华彬.基于改进最优子抽样算法的大数据分析提效方法[J].佳木斯大学学报（自然科学版）,2024,42(9):22-25.

1杨翘楚.统计学助力人工智能(AI)[J].农村青少年科学探究,2023(7):56-57.
2巨培源,董如红.中西医结合在消化性溃疡出血患者临床治疗中应用的效果观察[J].中文科技期刊数据库（文摘版）医药卫生,2023(9):131-133.
3陈镇,汤安英,袁鹭,郑菲菲,雷国铨.茶农农药减施意愿及影响因素分析——以福建省重点产茶区茶农为例[J].农业展望,2023,19(7):150-156.
4王丽娟,信丽媛.农户购种行为及满意度实证研究——基于天津市的调查[J].农业展望,2023,19(7):144-149.
5姚天奇,倪皓宇,王铭涛,刘嘉帅,郭雅茹.基于Logit模型的农村居民使用清洁能源的意愿和影响因素分析——以河北省石家庄市为例[J].村委主任,2023(7):139-142.
6陈诗,母德锦,王瑜,陈林,唐军荣,蔡年辉,许玉兰,李孙玲.全自动间断化学分析仪在苗木氮、磷测定方法中的优良性研究[J].西部林业科学,2023,52(4):77-82.
7李浩,陈志涛,李朋,陆艳铭,陈靖,王高新,李舒.基于蒙特卡洛多层次抽样的高速公路隧道管理新体系的消防安全评价[J].隧道建设（中英文）,2023,43(7):1236-1245. 被引量：4
8景怡萱,吴迪,刘贵珊,何建国,杨世虎,马萍,孙媛媛.图谱数据融合的灵武长枣瘀伤等级判别[J].光谱学与光谱分析,2023,43(8):2644-2648. 被引量：1
9李文悌,李冠霖.评估多台仪器检测全血白细胞浓度测量不确定度[J].现代疾病预防控制,2023,34(8):614-617.

德州学院学报

2023年第4期

浏览历史

内容加载中请稍等...

大数据下Logistic模型的最优子抽样算法研究被引量：1

参考文献1

二级参考文献12

共引文献6

同被引文献2

引证文献1

相关作者

相关机构

相关主题

浏览历史

大数据下Logistic模型的最优子抽样算法研究 被引量：1

参考文献1

二级参考文献12

共引文献6

同被引文献2

引证文献1

相关作者

相关机构

相关主题

浏览历史

大数据下Logistic模型的最优子抽样算法研究被引量：1