摘要
由于超高维生存数据存在删失,因此处理超高维完全数据的变量筛选法大多不再适用。大多数变量筛选法虽能以较大的概率保留所有重要变量,即具有确定筛选性,但却未能很好地控制错误发现率(FDR),所以寻找一种可以平衡模型的可解释性和稳定性的降维方法显得尤为重要。文章探讨了超高维生存数据中基于相关性秩排序且不依赖于模型的变量筛选法和FDR控制,提出了一种使用Knockoff协变量指定变量筛选阈值的两步过程,可以将FDR控制在预先指定的水平α下。数值模拟和实证分析的结果表明,在FDR水平α大于或等于1 s(s是重要变量数量)的情况下,提出的两步CR-Knockoff过程同时具有确定筛选和FDR控制的性能。
Because the ultrahigh dimensional survival data is censored,variable screening methods for ultrahigh dimension-al complete data are no longer applicable.Most variable screening methods can retain all the active variables with high probabili-ty,that is,they have deterministic screening ability,but they fail to control the false discovery rate(FDR)well.Therefore,it is par-ticularly important to find a dimensionality reduction method that can balance the interpretability and stability of statistical mod-els.This paper discusses the variable screening and FDR control based on sorting of correlation rank and independent of model in ultrahigh dimensional survival data,then proposes a two-step approach that uses Knockoff variables for specifying the thresh-old of variable screening so that the FDR is controlled under a pre-specified levelα.The results of numerical simulation and em-pirical analysis show that the proposed two-step CR-Knockoff process has the performance of both deterministic screening and FDR control when the FDR levelαis greater than or equal to 1/s(s is the number of important variables).
作者
潘莹丽
赵晓洛
张淑莹
刘展
Pan Yingli;Zhao Xiaoluo;Zhang Shuying;Liu Zhan(Faculty of Mathematics and Statistics,Hubei University,Wuhan 430062,China;Hubei Key Laboratory of Applied Mathematics,Hubei University,Wuhan 430062,China;School of Statistics and Mathematics,Zhongnan University of Economics and Law,Wuhan 430073,China)
出处
《统计与决策》
CSSCI
北大核心
2023年第19期47-52,共6页
Statistics & Decision
基金
科技大数据湖北省重点实验室(中国科学院武汉文献情报中心)开放基金课题资助课题(E3KF291001)
湖北大学专业学位研究生课程案例库建设项目(104017544)。