摘要
针对高维数据的特点并基于线性回归模型,利用变量选择降维技术,提出了一种新的、有效的变量选择(或称特征提取)的正则化估计方法.新的正则化估计方法主要考虑了数据的噪声(方差)对正则化估计的影响,在寻找估计的正则化路径时能对方差进行有效估计,且基于凸优化问题的KKT条件和坐标算法思想给出了正则化估计算法的实施细节.实验结果表明,该方法能够提高高维数据集进行估计和变量选择的准确性,是高维数据挖掘中新的、有效的特征提取方法.
According to the feature of high-dimensional data, a new and efficient variable selection method (or feature extraction method) is introduced by using dimensional reduction technique based on the regularized estimation method of linear regression model. The new method takes the influence of the noise (variance) for the regularized estimation into account, which can get the path of the regularized estimation and the estimation of variance. Furthermore, based on the KKT condition and the mind of coordinate-wise algorithm, the details of the algorithm are given for the regularized estimation method. By the result of simulation result, the new method can carry out both estimation and variable selection very well. It is really an efficient feature extraction method for high-dimensional data mining.
出处
《宁夏大学学报(自然科学版)》
CAS
2012年第4期342-345,349,共5页
Journal of Ningxia University(Natural Science Edition)
基金
江苏省自然科学基金资助项目(SBK200920379)
南通大学自然科学基金资助项目(10Z008)
关键词
数据挖掘
高维数据
变量选择
正则化估计
LASSO
坐标算法
data mining
high-dimensional data
variable selection
regulaized estimation
least absolute skrinkage and selection operator
coordinate-wise algorithm