摘要
Lasso是一种以一范数为基础的变量选择方法。相比于其他方法,Lasso不仅可以精确地选择出与类属性强相关的变量,而且还保持了变量选择的稳定性。但是Lasso研究在高维海量的基因数据时会出现计算机开销过大的情况。针对这一问题,文章提出一种分而治之的Lasso方法。首先将数据集分成K份,对每一份进行变量选择,再把每份系统的合并,重新进行变量选择。通过检验结果显示,基于分而治之的Lasso方法,在海量的基因数据中进行关联变量选择表现很好。
Lasso is a variable selection method based on a norm. Compared with other methods, Lasso can not only accurately select variables strongly related to class attributes, but also maintain the stability of variable selection. However, when researching the high-dimensional mass data by using lasso, it would make excessive computer overhead. To solve this problem, this paper proposes a spilt-and-conquer Lasso method. Firstly, the data set is divided into K shares, and each variable is selected. And then each system is merged and variable selection performed again. The test results indicate that the Lasso method based on spilt-and-conquer method performs very well in relational variable selection in a sea of gene data.
作者
兰晓然
张灏
Lan Xiaoran;Zhang Hao(School of Mathematics, Taiyuan University of Technology, Taiyuan 030024, China;Department of Mathematics, University of Arizona, Tucson Arizona 85721, USA)
出处
《统计与决策》
CSSCI
北大核心
2018年第12期64-67,共4页
Statistics & Decision
基金
国家自然科学基金资助项目(11571009)