摘要
随着计算机的飞速发展,极大地便利了数据的获取和存储,很多企业积累了大量的数据,同时数据的维度也越来越高,噪声变量越来越多,因此在建模分析时面临的重要问题之一就是从高维的变量中筛选出少数的重要变量。针对因变量取值为(0,1)区间的比例数据提出了正则化Beta回归,研究了在LASSO、SCAD和MCP三种惩罚方法下的极大似然估计及其渐进性质。统计模拟表明MCP的方法会优于SCAD和LASSO,并且随着样本量的增大,SCAD的方法也将优于LASSO。最后,将该方法应用到中国上市公司股息率的影响因素研究中。
With the rapid development of computer,greatly facilitate the data acquisition and storage,especially in the era of big data,many enterprises have accumulated a large amount of data.At the same time,the dimension of data is higher and higher with more and more noise variables,therefore one of important problem when modeling analysis is to select significant variables.This article propose a regularized Beta regression for proportional response with value in(0,1),giving maximum likelihood estimation with LASSO,SCAD and MCP penalty methods,the variable selection and estimation of coefficients can be conducted simultaneously.We also give the proof of its asymptotical and oracle properties.Simulation results show that MCP penalty is the best method,and SCAD perform better than LASSO as the sample size increase.Finally,we apply this method to select significant factors for dividend rate.
出处
《统计与信息论坛》
CSSCI
北大核心
2016年第8期14-20,共7页
Journal of Statistics and Information
基金
国家自然科学基金项目<广义线性模型的组变量选择及其在信用评分中的应用>(71471152)
国家社会科学基金重大项目<大数据与统计学理论的发展研究>(13&ZD148)
国家社会科学基金项目<大数据的高维变量选择方法及其应用研究>(13CTJ001)
全国统计科学研究重点项目<大数据下的信用评分研究>(2015629)
关键词
Beta回归
变量选择
正则化
股息率
Beta regression
variable selection
regularization
dividend rate