Many modern biomedical studies have yielded survival data with high-throughput predictors.The goals of scientific research often lie in identifying predictive biomarkers,understanding biological mechanisms and making ...Many modern biomedical studies have yielded survival data with high-throughput predictors.The goals of scientific research often lie in identifying predictive biomarkers,understanding biological mechanisms and making accurate and precise predictions.Variable screening is a crucial first step in achieving these goals.This work conducts a selective review of feature screening procedures for survival data with ultrahigh dimensional covariates.We present the main methodologies,along with the key conditions that ensure sure screening properties.The practical utility of these methods is examined via extensive simulations.We conclude the review with some future opportunities in this field.展开更多
Sure independence screening(SIS) has been proposed to reduce the ultrahigh dimensionality down to a moderate scale and proved to enjoy the sure screening property under Gaussian linear models.However,the observed re...Sure independence screening(SIS) has been proposed to reduce the ultrahigh dimensionality down to a moderate scale and proved to enjoy the sure screening property under Gaussian linear models.However,the observed response is often skewed or heavy-tailed with extreme values in practice,which may dramatically deteriorate the performance of SIS.To this end,we propose a new robust sure independence screening(RoSIS) via considering the correlation between each predictor and the distribution function of the response.The proposed approach contributes to the literature in the following three folds: First,it is able to reduce ultrahigh dimensionality effectively.Second,it is robust to heavy tails or extreme values in the response.Third,it possesses both sure screening property and ranking consistency property under milder conditions.Furthermore,we demonstrate its excellent finite sample performance through numerical simulations and a real data example.展开更多
Ultra-high-dimensional data with grouping structures arise naturally in many contemporary statistical problems,such as gene-wide association studies and the multi-factor analysis-of-variance(ANOVA).To address this iss...Ultra-high-dimensional data with grouping structures arise naturally in many contemporary statistical problems,such as gene-wide association studies and the multi-factor analysis-of-variance(ANOVA).To address this issue,we proposed a group screening method to do variables selection on groups of variables in linear models.This group screening method is based on a working independence,and sure screening property is also established for our approach.To enhance the finite sample performance,a data-driven thresholding and a two-stage iterative procedure are developed.To the best of our knowledge,screening for grouped variables rarely appeared in the literature,and this method can be regarded as an important and non-trivial extension of screening for individual variables.An extensive simulation study and a real data analysis demonstrate its finite sample performance.展开更多
This paper proposes a new sure independence screening procedure for high-dimensional survival data based on censored quantile correlation(CQC).This framework has two distinctive features:1)Via incorporating a weightin...This paper proposes a new sure independence screening procedure for high-dimensional survival data based on censored quantile correlation(CQC).This framework has two distinctive features:1)Via incorporating a weighting scheme,our metric is a natural extension of quantile correlation(QC),considered by Li(2015),to handle high-dimensional survival data;2)The proposed method not only is robust against outliers,but also can discover the nonlinear relationship between independent variables and censored dependent variable.Additionally,the proposed method enjoys the sure screening property under certain technical conditions.Simulation results demonstrate that the proposed method performs competitively on survival datasets of high-dimensional predictors.展开更多
In this paper,we propose a new correlation,called stable correlation,to measure the dependence between two random vectors.The new correlation is well defined without the moment condition and is zero if and only if the...In this paper,we propose a new correlation,called stable correlation,to measure the dependence between two random vectors.The new correlation is well defined without the moment condition and is zero if and only if the two random vectors are independent.We also study its other theoretical properties.Based on the new correlation,we further propose a robust model-free feature screening procedure for ultrahigh dimensional data and establish its sure screening property and rank consistency property without imposing the subexponential or sub-Gaussian tail condition,which is commonly required in the literature of feature screening.We also examine the finite sample performance of the proposed robust feature screening procedure via Monte Carlo simulation studies and illustrate the proposed procedure by a real data example.展开更多
基金Supported by the National Natural Science Foundation of China(11528102)the National Institutes of Health(U01CA209414)
文摘Many modern biomedical studies have yielded survival data with high-throughput predictors.The goals of scientific research often lie in identifying predictive biomarkers,understanding biological mechanisms and making accurate and precise predictions.Variable screening is a crucial first step in achieving these goals.This work conducts a selective review of feature screening procedures for survival data with ultrahigh dimensional covariates.We present the main methodologies,along with the key conditions that ensure sure screening properties.The practical utility of these methods is examined via extensive simulations.We conclude the review with some future opportunities in this field.
基金Supported by National Natural Science Foundation of China(Grant Nos.11301435 and 71131008)the Fundamental Research Funds for the Central Universities
文摘Sure independence screening(SIS) has been proposed to reduce the ultrahigh dimensionality down to a moderate scale and proved to enjoy the sure screening property under Gaussian linear models.However,the observed response is often skewed or heavy-tailed with extreme values in practice,which may dramatically deteriorate the performance of SIS.To this end,we propose a new robust sure independence screening(RoSIS) via considering the correlation between each predictor and the distribution function of the response.The proposed approach contributes to the literature in the following three folds: First,it is able to reduce ultrahigh dimensionality effectively.Second,it is robust to heavy tails or extreme values in the response.Third,it possesses both sure screening property and ranking consistency property under milder conditions.Furthermore,we demonstrate its excellent finite sample performance through numerical simulations and a real data example.
基金supported by the National Natural Science Foundation of China(CN)(11571112)the National Social Science Foundation Key Program(17ZDA091)+1 种基金Natural Science Fund of Education Department of Anhui Province(KJ2013B233)the 111 Project of China(B14019).
文摘Ultra-high-dimensional data with grouping structures arise naturally in many contemporary statistical problems,such as gene-wide association studies and the multi-factor analysis-of-variance(ANOVA).To address this issue,we proposed a group screening method to do variables selection on groups of variables in linear models.This group screening method is based on a working independence,and sure screening property is also established for our approach.To enhance the finite sample performance,a data-driven thresholding and a two-stage iterative procedure are developed.To the best of our knowledge,screening for grouped variables rarely appeared in the literature,and this method can be regarded as an important and non-trivial extension of screening for individual variables.An extensive simulation study and a real data analysis demonstrate its finite sample performance.
基金supported by the National Natural Science Foundation of China under Grant No.11901006the Natural Science Foundation of Anhui Province under Grant Nos.1908085QA06 and 1908085MA20。
文摘This paper proposes a new sure independence screening procedure for high-dimensional survival data based on censored quantile correlation(CQC).This framework has two distinctive features:1)Via incorporating a weighting scheme,our metric is a natural extension of quantile correlation(QC),considered by Li(2015),to handle high-dimensional survival data;2)The proposed method not only is robust against outliers,but also can discover the nonlinear relationship between independent variables and censored dependent variable.Additionally,the proposed method enjoys the sure screening property under certain technical conditions.Simulation results demonstrate that the proposed method performs competitively on survival datasets of high-dimensional predictors.
基金supported by National Natural Science Foundation of China(Grant No.11701034)supported by National Science Foundation of USA(Grant No.DMS1820702)。
文摘In this paper,we propose a new correlation,called stable correlation,to measure the dependence between two random vectors.The new correlation is well defined without the moment condition and is zero if and only if the two random vectors are independent.We also study its other theoretical properties.Based on the new correlation,we further propose a robust model-free feature screening procedure for ultrahigh dimensional data and establish its sure screening property and rank consistency property without imposing the subexponential or sub-Gaussian tail condition,which is commonly required in the literature of feature screening.We also examine the finite sample performance of the proposed robust feature screening procedure via Monte Carlo simulation studies and illustrate the proposed procedure by a real data example.