摘要
在较高维度下的特征集合中如何筛选出重要子集是统计学中经常遇到的问题,而利用建立合适的稀疏模型使得模型复杂度降低的方法是近年来各领域学者研究的热点。文章基于在模型建立前考虑存在因高维自变量间的高度线性相关性而具有的组结构特性问题,提出了一种新的基于变量聚类的主成分Lasso(简记为VPLasso)降维算法。数值模拟结果表明,当模型中自变量同时具有组间稀疏和组内稀疏两种特性时,提出的降维算法不仅在回归参数的估计精确度上,而且在变量选择准确性上都优于经典的组结构特征选择算法。
How to screen out important subsets in the characteristic set under higher dimensions is an issue often encountered in statistics,and the method of establishing appropriate sparse model to reduce the model complexity has become a hot research topic among scholars in various fields in recent years.This paper proposes a new VARCLUS-based principal component Lasso(VP Lasso)dimensionality reduction algorithm based on the consideration of the group structure characteristics due to the high linear correlation between high-dimensional independent variables before the model is established.The numerical simulation results show that when independent variables in the model have both the characteristics of inter-group sparsity and intra-group sparsity,the proposed algorithm is superior to the classical group structure feature selection algorithm not only in the estimation accuracy of regression parameters,but also in the variable selection accuracy.
作者
许赟娟
罗幼喜
Xu Yunjuan;Luo Youxi(School of Science,Hubei University of Technology,Wuhan 430068,China)
出处
《统计与决策》
CSSCI
北大核心
2021年第4期31-36,共6页
Statistics & Decision
基金
国家社会科学基金资助项目(17BJY210)