摘要
大数据背景下,基于罚函数的正则化方法是高维数据变量选择的重要方法.Lasso估计是常用的变量选择方法,而Lasso正则化参数的取值直接影响选择模型的性能,是正则化方法成败的关键.针对Lasso估计,提出一种新的L曲线(LC)准则选择正则化参数.数值模拟和实际应用表明:相比CV,GCV,BIC等准则,LC准则能够以较高的概率选择真实的模型,并且具有较小的模型误差.
In the background of big data,the regularization method based on the penalty function is vital for variables selection of high-dimensional data.Lasso is a common method for variable selection.The value of Lasso regularization parameters directly affects the performance of the selection model,which is the key to the regularization method.Aiming at Lasso,the L-curve criterion for the selection of regularization parameters has been modified,and the new LC criterion been proposed.Through data simulation and practical application,compared with CV,GCV,BIC and other criteria,the LC criterion can select a real model with a higher probability and has a smaller model error.
作者
吴炜明
王延新
WU Weiming;WANG Yanxin(School of Science, Ningbo University of Technology, Ningbo Zhejiang 315211, China;Business School, Anhui University of Technology, Ma'anshan Anhui 243032, China)
出处
《西南师范大学学报(自然科学版)》
CAS
2022年第1期36-42,共7页
Journal of Southwest China Normal University(Natural Science Edition)
基金
全国统计科学研究项目(2019LY06)
浙江省自然科学基金资助项目(LY18A010026)
国家级大学生创新创业训练计划项目(201911058025)
宁波市自然科学基金项目(2021J143).
关键词
高维数据
变量选择
Lasso
LC准则
正则化参数
high-dimensional data
variable selection
Lasso
LC criterion
regularization parameter selection