摘要
回归方法是重要的数据分析工具。带平滑削边绝对偏离(smoothly clipped absolute deviation, SCAD)正则项的回归问题,以其在处理高维数据中的近似无偏性(见Fan和Li,2001),在大数据分析中得到广泛应用。但在大数据背景下,待求解的SCAD回归问题的数据量往往很大,而且分布在不同地理位置,这使得在SCAD回归问题的求解算法设计中,需要重新考虑计算的内存使用量。常规用于求解SCAD回归问题的优化算法(LQA、LLA、ADMM等)往往需要在每一次迭代中更新全部变量,从而造成计算的内存需求很大,难以适应大数据的求解要求。随机坐标下降方法(stochastic coordinate descent, SCD)以其子问题运算内存需求小(见Nesterov,2012)的优势,在大规模分布式最优化问题中得到了广泛的应用。但目前理论上SCD算法仅能处理带凸惩罚项的回归问题,由于SCAD回归问题中惩罚项的非凸非光滑性,现有的随机坐标下降方法难以处理这一问题。首先对SCAD回归问题模型进行分析,得出SCAD回归模型的损失函数是导数Lipschitz、惩罚函数是semi-convex的,此外根据已有结论,得到SCAD回归问题的稳定点即可保证良好的统计性质。基于这些性质的分析,介绍了一种新的随机坐标下降方法(variable bregman stochastic coordinate descent, VBSCD),这一方法能很好求解带SCAD惩罚项的回归问题,算法的收敛点是SCAD回归模型的稳定点。最后,通过计算实验进一步说明本算法在求解SCAD回归问题的有效性。对不同的变量分组数,算法迭代到稳定点所需的迭代回合数相对稳定。随着变量分块数的增加,单次迭代中计算的内存需求减少。该研究方法可广泛应用于大数据背景下SCAD回归问题的求解当中。
The regression problem with the smoothed clipped absolute deviation (SCAD) penalty is widely used in big data analysis because of its approximate unbiasedness in processing high-dimensional data (see Fan and Li, 2001). However, in the context of big data, the amount of data is often large and distributed in different locations, which makes the conventional algorithms for SCAD (LQA,LLA,ADMM, etc.) difficult to adapt to the current needs of solving SCAD regression. Stochastic Coordinate Descent (SCD) has been widely used in large-scale distributed optimization problems because of its small memory requirements (see Nesterov, 2012). However, in theory, the SCD algorithm can only deal with the regression problem with convex penalty. Existing random coordinate descent method is difficult to deal with the non-convex non-smooth SCAD regression problem. This paper first analyzes the SCAD regression problem model, and concludes that the loss function of the SCAD regression model is the derivative Lipschitz, and the penalty function is semi-convex. In addition, according to the existing conclusions, the critical point of the SCAD regression problem can ensure good statistical properties. Based on the analysis above, this paper introduces a new method of Variable Bregman Stochastic Coordinate Descent (VBSCD), which can solve the regression problem with SCAD penalty. The accumulated point of the algorithm is the critical point of SCAD regression model. Finally, the effectiveness of the proposed algorithm in solving SCAD regression is further illustrated by computational experiments. The algorithm proposed in this paper can be widely applied to solve the SCAD regression in the big data era.
作者
赵磊
陈玎
朱道立
ZHAO Lei;CHEN Ding;ZHU Daoli(Antai College of Economics & Management, Shanghai Jiao Tong University, Shanghai 200030, China;Sino-US Global Logistics Institute, Shanghai Jiao Tong University, Shanghai 200030, China)
出处
《上海管理科学》
2019年第5期97-103,共7页
Shanghai Management Science
基金
国家自然科学基金资助项目(71471112
71871140)
关键词
平滑削边绝对偏离
回归问题
随机坐标下降方法
smoothed clipped absolute deviation
regression
stochastic coordinate descent