摘要
针对超高维变量筛选问题,提出一种新的稳健秩条件特征筛选方法,简称为RRCSIS。该方法不依赖于模型设定,并且可以同时处理条件特征筛选和特征筛选。数值模拟表明,RRCSIS在因变量或者自变量含有厚尾分布或者含有异常值时表现都很稳健,并且明显优于其他筛选方法。此外,为了识别出联合相关而边际不相关的变量,还提出了一种迭代的筛选过程,即IRRCSIS。最后,通过一个实例分析说明了该方法的有效性。
Ultrahigh variable screening plays an important part in statistical research.In this paper,we introduce a new robust rank conditional feature screening method(RRCSIS,for short)based on the rank of a random variable.The newly proposed screening procedure does not depend on any model assumption and also it can deal with both the conditional feature screening and feature screening in a unified way.Simulation studies show that RRCSIS is robust to heavy-tailed distributions or outlier in both directions of response and covariates and RRCSIS is obviously superior to other screening methods.Besides,we also propose an iterative screening procedure(IRRCSIS,for short)to detect important predictors that are marginally uncorrelated but jointly correlated to the response.And we further illustrate its effectiveness through a real data example.
作者
李向杰
张景肖
LI Xiang-jie;ZHANG Jing-xiao(Center for Applied Statistics Renmin University of China,Beijing 100872,China;School of Statistics,Renmin University of China,Beijing 100872,China)
出处
《统计与信息论坛》
CSSCI
北大核心
2018年第4期6-12,共7页
Journal of Statistics and Information
基金
中国人民大学科学研究基金项目(中央高校基本科研业务费专项资金资助)项目(17XNH088)
关键词
条件特征筛选
超高维数据
稳健秩
模型自由
conditional feature screening
ultrahigh-dimensional data
robust rank
model-free