摘要
特征筛选方法是处理超高维数据的一种快速有效的降维方法.针对超高维判别分类数据,提出一种改进的超高维特征筛选方法,方法不需要特定的模型假定;可以处理多分类响应变量情形;可适用于离散型或连续型协变量情形;对服从重尾分布的协变量,方法仍具有较好的稳健性.从理论上证明了所提出特征筛选方法满足确定筛选性和指标排序相合性,并通过数值模拟和实例分析在有限样本条件下验证了方法的有效性.
Feature screening is a fast and effective dimensionality reduction method for the ultrahigh-dimensional data. For ultrahigh-dimensional discriminant classification data, an improved ultrahigh-dimensional feature screening method is proposed in this paper. The proposed procedure does not require a specification on the model structure. It can handle the case where the response variable is multi-class. It is applicable to categorical and contiuuous covariates. The method is robust to heavy-tailed distribution of predictors. This paper proves theoretically that the proposed feature screening method satisfies the sure screening property and ranking consistency property. Numerical simulation and a real data application under the finite sample are conducted to evaluate the performance of the proposed method.
作者
来鹏
沈宝华
宋凤丽
LAI Peng;SHEN Bao-hua;SONG Feng-li(School of Mathematics & Statistics, Nanjing University of Information Science & Technology, Nanjing 210044, China)
出处
《数学的实践与认识》
北大核心
2018年第9期154-162,共9页
Mathematics in Practice and Theory
基金
国家自然科学基金(11771215)
国家社会科学基金重大项目(16ZDA047,17ZDA092)
江苏省自然科学基金(BK20161530,BK20140983)
江苏省“青蓝工程”项目(2016)
关键词
特征筛选
条件分布函数
确定筛选性质
排序相合性
feature screening
conditional distribution function
sure screeing property
rank- ing consistency property