高维数据挖掘中的正则化估计新方法

New Regularized Estimation in High-dimensional Data Mining

下载PDF

导出

摘要针对高维数据的特点并基于线性回归模型,利用变量选择降维技术,提出了一种新的、有效的变量选择(或称特征提取)的正则化估计方法.新的正则化估计方法主要考虑了数据的噪声(方差)对正则化估计的影响,在寻找估计的正则化路径时能对方差进行有效估计,且基于凸优化问题的KKT条件和坐标算法思想给出了正则化估计算法的实施细节.实验结果表明,该方法能够提高高维数据集进行估计和变量选择的准确性,是高维数据挖掘中新的、有效的特征提取方法. According to the feature of high-dimensional data, a new and efficient variable selection method （or feature extraction method） is introduced by using dimensional reduction technique based on the regularized estimation method of linear regression model. The new method takes the influence of the noise （variance） for the regularized estimation into account, which can get the path of the regularized estimation and the estimation of variance. Furthermore, based on the KKT condition and the mind of coordinate-wise algorithm, the details of the algorithm are given for the regularized estimation method. By the result of simulation result, the new method can carry out both estimation and variable selection very well. It is really an efficient feature extraction method for high-dimensional data mining.

作者李泽安陈建平

机构地区南通大学计算机科学与技术学院

出处《宁夏大学学报（自然科学版）》 CAS 2012年第4期342-345,349,共5页 Journal of Ningxia University(Natural Science Edition)

基金江苏省自然科学基金资助项目(SBK200920379) 南通大学自然科学基金资助项目(10Z008)

关键词数据挖掘高维数据变量选择正则化估计 LASSO 坐标算法 data mining high-dimensional data variable selection regulaized estimation least absolute skrinkage and selection operator coordinate-wise algorithm

分类号 TP312 [自动化与计算机技术—计算机软件与理论]

引文网络
相关文献

参考文献7

1HASTIE T;TIBSHIRANI R;FRIEDMAN J;范明;柴玉梅;昝红英.统计学习基础一数据挖掘、推理与预测[M]北京:电子工业出版社,2004.
2B(U)HLMANN P,Van de GEER S. Statistics for High-Dimensional Data:Methods,Theory and Applications[M].Berlin.Springer-Verlag Berlin Heidelberg,2011.
3李昕,钱旭,王自强.一种高效的高维异常数据挖掘算法[J].计算机工程,2010,36(21):34-36. 被引量：7
4TIBSHIRANI R. Regression shrinkage and selection via the LASSO[J].Journal of the Royal Statistical Society,Series B:Statistical Methodology,1996,(01):267-288.
5EFRON B,HASTIE T,JOHNSTONE I. Least angleregression[J].Annals of Statistics,2004.407-489.
6ZOU Hui. The adaptive LASSO and its oracle properties[J].Journal of the American Statistical Association,2006,(476):1418-1429.doi:10.1198/016214506000000735.
7FRIEDMAN J,HASTIE T,HOFLING H. Pathwise coordinate optimization[J].The Annals of Applied Statistics,2007,(02):302-332.

二级参考文献5

1Li Haifeng,Jiang Tao,Zhang Keshu.Efficient and Robust Feature Extraction by Maximum Margin Criterion[J].IEEE Transactions on Neural Networks.2006,17(1):157-165.
2Lanckriet G R G,Ghaoui L E,Bhattacharyya C,et al.A Robust Minimax Approach to Classification[J].The Journal of Machine Learning Research,2002,25(3): 555-582.
3Blake C L,Merz C J.UCI Repository of Machine Learning Databases[EB/OL].(1998-05-01).http://www.ics.uci.edu/mlearn/ MLRepository.html.
4Hettich S,Bay S D.KDD CUP 1999 Data[EB/OL].(1999-10-28).http://kdd.ics.uci.edu/databases/kddcup99/kddcup.html.
5王靖.基于鲁棒的全局流形学习方法[J].计算机工程,2008,34(9):192-194. 被引量：6

共引文献6

1李洪波.物联网环境下舰船监控网络高维异常数据挖掘方法[J].舰船科学技术,2019,0(20):154-156. 被引量：1
2李磊,彭勇.基于云模型的异常挖掘算法[J].微电子学与计算机,2013,30(8):82-85.
3黄景涛,任志伟,罗威.电站锅炉监测数据的异常值检测算法研究[J].计算机与应用化学,2013,30(10):1153-1156. 被引量：6
4李昊奇,应娜,郭春生,王金华.基于深度信念网络和线性单分类SVM的高维异常检测[J].电信科学,2018,34(1):34-42. 被引量：18
5张航,徐建.基于LSA的历史工作票问题分类异常检测[J].计算机与数字工程,2018,46(5):950-955.
6杨伟伟,王思宁,郑贵德,宋亚琼.基于知识库的制造业能耗优化平台技术研究[J].电信科学,2022,38(8):178-185. 被引量：1

1李泽安.高维数据挖掘中基于正则化估计的特征提取算法[J].合肥工业大学学报（自然科学版）,2012,35(12):1655-1658. 被引量：1
2李泽安,陈建平,赵为华.高维数据挖掘中基于中位数回归的特征提取新方法[J].计算机应用研究,2013,30(2):374-376. 被引量：9
3虞翔,李青.大数据环境下的高维数据挖掘在入侵检测中的有效应用[J].电脑编程技巧与维护,2016(22):57-58. 被引量：1
4李郁林.高维数据挖掘中的聚类算法研究[J].电脑与电信,2012(11):47-49.
5姜请超.高维数据中频繁项集生成算法的研究[J].软件（教学）,2015,0(1):73-73.
6陈慧萍,王煜,王建东.高维数据挖掘算法的研究与进展[J].计算机工程与应用,2006,42(24):170-173. 被引量：8
7程凯,杨晓.51单片机系统中的触摸屏坐标算法[J].单片机与嵌入式系统应用,2003(8):74-75. 被引量：1
8曾令华,欧阳开翠.高维数据挖掘在入侵检测中的应用[J].网络安全技术与应用,2005(8):41-43. 被引量：2
9沈萍.高维数据挖掘技术研究[J].电脑知识与技术（过刊）,2009,0(6):1301-1303. 被引量：1
10林书亮.高维数据挖掘中基于稀疏回归的嵌入式特征提取方法[J].中国西部科技,2013,12(12):25-27. 被引量：1

宁夏大学学报（自然科学版）

2012年第4期

浏览历史

内容加载中请稍等...

高维数据挖掘中的正则化估计新方法

参考文献7

二级参考文献5

共引文献6

相关作者

相关机构

相关主题

浏览历史