异方差大数据下联合均值与方差模型的α-最优子抽样被引量：1

α-Optimal Subsampling for Joint Mean and Variance Models Under Heteroscedasticity Big Data

导出

摘要随着信息技术的发展,经济、金融、工业等领域产生了异常庞大的数据,这些数据往往具有异方差特性,传统统计模型和统计方法难以解决该类大数据的建模问题.子抽样是处理大数据的重要方法.文章针对联合均值与方差模型,在异方差大数据环境下研究了子抽样问题.文章主要贡献如下:对具有异方差特性的大数据建立联合均值与方差模型,在一定条件下,基于A-最优准则和L-最优准则讨论了子样本参数估计的一致性和渐近正态性;首次提出了异方差大数据下联合均值与方差模型的α-最优子抽样算法.数值模拟和实证分析的结果表明,该抽样算法能提高估计的精确性,减少计算成本. With the development of information technology,an unusually large amount of data is generated in economy,finance,industry and other fields,and these data have the characteristics of heteroscedasticity.The traditional statistical models and statistical methods can not solve the heteroscedasticity modeling problem in big data.Subsampling is an important method to deal with big data.In this paper,we study the subsampling for the joint mean and variance models in the heteroskedastic big data environment.The main contributions of this paper are as follows:The joint mean and variance models are developed for heteroscedasticity big data,and the consistency and asymptotic normality of the subsample estimator are proven based on the-optimality criterion and the-optimality criterion under certain conditions;An-optimal subsampling algorithm of the joint mean and variance models for heteroscedasticity big data is proposed.The results of numerical simulations and a real example show that the sampling algorithm improves estimation accuracy and reduces computational costs.

作者熊正榆吴刘仓杨兰军 XIONG Zhengyu;WU Liucang;YANG Lanjun(Faculty of Science,Kunming University of Science and Technology,Kunming 650500;Center for Applied Statistics,Kunming University of Science and Technology,Kunming 650500)

机构地区昆明理工大学理学院昆明理工大学应用统计学研究中心

出处《系统科学与数学》 CSCD 北大核心 2024年第7期2146-2172,共27页 Journal of Systems Science and Mathematical Sciences

基金国家自然科学基金(12261051) 云南省基础研究专项重点项目(202401AS070061) 昆明理工大学哲学社会科学科研创新团队(CXTD2023005)资助课题。

关键词异方差大数据联合均值与方差模型 α-最优子抽样 Heteroscedasticity big data joint mean and variance models -optimal subsampling

分类号 O212.2 [理学—概率论与数理统计]

引文网络
相关文献

参考文献4

1李莉莉,靳士檑,周楷贺.基于岭回归模型大数据最优子抽样算法研究[J].系统科学与数学,2022,42(1):50-63. 被引量：9
2牛晓阳,邹家辉.非参数局部多项式回归估计的最优子抽样算法[J].系统科学与数学,2022,42(1):72-84. 被引量：3
3ZHANG ZhongZhan1 & WANG DaRong2 1College of Applied Sciences, Beijing University of Technology, Beijing 100124, China,2The Pilot College, Beijing University of Technology, Beijing 101101, China.Simultaneous variable selection for heteroscedastic regression models[J].Science China Mathematics,2011,54(3):515-530. 被引量：7
4陈世录,刘瑞元.多项分布的数学期望、协方差阵、特征函数及母函数[J].青海师范大学学报（自然科学版）,2003,19(2):10-13. 被引量：2

二级参考文献39

1Jiang J.REML estimation: asymptotic behavior and related topics. The Annals of Statistics . 1996
2Wang Y G,Zhao Y.A modified pseudolikelihood approach for analysis of longitudinal data. Biometrics . 2007
3Murray Aitkin.Modelling variance heterogeneity in normal regreesion using GLIM. Applied Statistics . 1987
4Akaike H.Information theory as an extension of the maximum likelihood principle. Second International Symposium on Information Theory . 1973
5E.Candes,T.Tao.The Dantzig selector:statistical estimation when p is much large than n (withdiscussion). The Annals of Statistics . 2007
6G. Claeskens,N. L. Hjort.The focused information criterion (with discussion). Journal of the American Statistical Association . 2003
7M.Durban,I.D.Cuttie.Adjustment of the profile likelihood for a class of normal regression models. Scandinavian Journal of Statistics . 2000
8B. Efron,T. Hastie,I. Johnstone,R. Tibshirani.Least Angle Regression. The Annals of Statistics . 2004
9J. Q. Fan,R. Li.Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association . 2001
10Fan, J,Li, R.Statistical challenges with high dimensionality: feature selection in knowledge discovery. Proceedings of the International Congress of Mathematicians . 2006

共引文献16

1吴刘仓,马婷,詹金龙.基于StN分布联合位置,尺度与偏度模型的极大似然估计[J].高校应用数学学报（A辑）,2013,28(4):431-438. 被引量：2
2ZHAO Weihua,ZHANG Riquan.Variable Selection of Varying Dispersion Student-t Regression Models[J].Journal of Systems Science & Complexity,2015,28(4):961-977. 被引量：1
3徐登可,张忠占.二项-泊松模型的变量选择[J].应用数学学报,2015,38(4):708-720. 被引量：2
4LI Huiqiong,WU Liucang,MA Ting.Variable Selection in Joint Location, Scale and Skewness Models of the Skew-Normal Distribution[J].Journal of Systems Science & Complexity,2017,30(3):694-709. 被引量：3
5CHEN Xiuping,CAI Guanghui,GAO Yan,ZHAO Shangwei.Asymptotic Optimality of the Nonnegative Garrote Estimator Under Heteroscedastic Errors[J].Journal of Systems Science & Complexity,2020,33(2):545-562. 被引量：2
6WU Liu-cang,YANG Song-qin,TAO Ye.Variable selection for skew-normal mixture of joint location and scale models[J].Applied Mathematics(A Journal of Chinese Universities),2021,36(4):475-491.
7张永刚,吕鹏飞,张悦,姚兴博,冯艳丽.基于Stacking集成学习的恶意URL检测系统设计与实现[J].现代电子技术,2023,46(10):105-109. 被引量：2
8谌桢文,常军.综合岭回归和SARIMA方法在桥梁健康监测数据分析中的应用[J].科学技术与工程,2023,23(20):8846-8853. 被引量：2
9Huihui SUN,Xiaofeng ZHANG.Study on Coded Permutation Entropy of Finite Length Gaussian White Noise Time Series[J].Chinese Journal of Electronics,2024,33(1):185-194.
10王玉,李莉莉,周楷贺.基于主成分分析法的两步子抽样算法及应用研究[J].青岛大学学报（自然科学版）,2023,36(4):13-17.

同被引文献20

1杨鸿雁,田英杰.机器学习在食品安全风险预警及抽检方案制订中的应用研究[J].管理评论,2022,34(11):315-323. 被引量：3
2梁卿,谢爱华,陈泽锋,黄建显.食品安全智慧抽样方法研究及应用[J].实验室检测,2023(7):15-19. 被引量：2
3肖英平,何祥祥,戴宝玲,桂国弘,唐标,杨华.采样方法对冷鲜鸡表面细菌DNA提取及高通量测序结果的影响[J].食品科学,2017,38(24):260-264. 被引量：14
4罗璇,马成业,牟丹,孙冬,梅真.现行食品抽检工作中重复抽检的原因分析及建议[J].食品安全质量检测学报,2020,11(10):3359-3362. 被引量：20
5邢军,李萱,黄孙杰,谢晋雄,慕容灏鼎.基于安全风险监测的进口食品抽样方法研究[J].中国口岸科学技术,2021,3(3):10-15. 被引量：4
6刘丽莉,马源.浅谈食品安全监督抽样过程中存在的风险及防范措施[J].食品安全导刊,2021(20):20-21. 被引量：1
7李太平,薄慧敏.中国食品安全监督抽检效率的定量评价研究[J].食品工业科技,2022,43(14):301-310. 被引量：9
8张亦凡,王铭海.食品安全抽样检验报告常见问题及质量提升措施探讨[J].食品安全导刊,2022(21):47-49. 被引量：2
9张婧蕾.免疫检测技术在食品检验中的应用[J].中国食品,2022(20):88-90. 被引量：2
10林真敏,严盼.大数据时代我国食品安全风险预警研究[J].中国食品,2023(2):88-90. 被引量：1

引证文献1

1杨瑞,赵豪豪,马海军.食品安全抽样技术发展与提升[J].食品安全质量检测学报,2024,15(14):293-298. 被引量：1

二级引证文献1

1孙华.食品安全检验检测中存在的问题及对策分析[J].食品安全导刊,2024(23):127-129.

1董洋,王丹璐,刘俊伯,吴刘仓.基于联合均值与方差模型的碳排放权影响因素分析[J].生态经济,2022,38(9):21-28. 被引量：3
2赵远英,吴刘仓,徐登可.带有不可忽略缺失数据的联合均值与方差模型的贝叶斯估计[J].昆明理工大学学报（自然科学版）,2020,45(1):125-132. 被引量：2
3王玉,李莉莉,周楷贺.基于主成分分析法的两步子抽样算法及应用研究[J].青岛大学学报（自然科学版）,2023,36(4):13-17.
4许兰伟.食品质量监督检验抽样中的问题分析[J].中文科技期刊数据库（全文版）自然科学,2024(5):0075-0078.
5何玉林,吴波,吴定明,黄哲学,菲律普弗尼尔-维格.针对大规模动态图流三角形计数的边哈希分布式抽样算法[J].计算机研究与发展,2024,61(8):1882-1903.
6唐瑶,袁辰辉.河南粮食主产区农业发展的实证分析[J].科学咨询,2024(13):6-9.
7代卓雅,曹利强.河南省耕地“非粮化”影响因素分析[J].黑龙江粮食,2024(6):62-64.
8李莉莉,周楷贺,杜梅慧.基于两步子抽样算法的多目标抽样统计推断研究[J].数理统计与管理,2023,42(6):1037-1060.
9李俊鹏,张崇岐.基于二阶指数混料模型的A-最优设计[J].数理统计与管理,2023,42(3):472-482.
10刘巍炜,周羽生,周文晴,苏盛,李彬,邓康健.考虑异方差性的城市电网电动汽车充电负荷预测[J].电力系统自动化,2024,48(15):54-63.

系统科学与数学

2024年第7期

浏览历史

内容加载中请稍等...

异方差大数据下联合均值与方差模型的α-最优子抽样被引量：1

参考文献4

二级参考文献39

共引文献16

同被引文献20

引证文献1

二级引证文献1

相关作者

相关机构

相关主题

浏览历史

异方差大数据下联合均值与方差模型的α-最优子抽样 被引量：1

参考文献4

二级参考文献39

共引文献16

同被引文献20

引证文献1

二级引证文献1

相关作者

相关机构

相关主题

浏览历史

异方差大数据下联合均值与方差模型的α-最优子抽样被引量：1