期刊文献+
共找到6篇文章
< 1 >
每页显示 20 50 100
Robust Sure Independence Screening for Ultrahigh Dimensional Non-normal Data 被引量:1
1
作者 Wei ZHONG 《Acta Mathematica Sinica,English Series》 SCIE CSCD 2014年第11期1885-1896,共12页
Sure independence screening(SIS) has been proposed to reduce the ultrahigh dimensionality down to a moderate scale and proved to enjoy the sure screening property under Gaussian linear models.However,the observed re... Sure independence screening(SIS) has been proposed to reduce the ultrahigh dimensionality down to a moderate scale and proved to enjoy the sure screening property under Gaussian linear models.However,the observed response is often skewed or heavy-tailed with extreme values in practice,which may dramatically deteriorate the performance of SIS.To this end,we propose a new robust sure independence screening(RoSIS) via considering the correlation between each predictor and the distribution function of the response.The proposed approach contributes to the literature in the following three folds: First,it is able to reduce ultrahigh dimensionality effectively.Second,it is robust to heavy tails or extreme values in the response.Third,it possesses both sure screening property and ranking consistency property under milder conditions.Furthermore,we demonstrate its excellent finite sample performance through numerical simulations and a real data example. 展开更多
关键词 ROBUSTNESS sure independence screening sure screening property ultrahigh dimensionality variable selection
原文传递
Nonparametric Feature Screening via the Variance of the Regression Function
2
作者 Won Chul Song Michael G. Akritas 《Open Journal of Statistics》 2024年第4期413-438,共26页
This article develops a procedure for screening variables, in ultra high-di- mensional settings, based on their predictive significance. This is achieved by ranking the variables according to the variance of their res... This article develops a procedure for screening variables, in ultra high-di- mensional settings, based on their predictive significance. This is achieved by ranking the variables according to the variance of their respective marginal regression functions (RV-SIS). We show that, under some mild technical conditions, the RV-SIS possesses a sure screening property, which is defined by Fan and Lv (2008). Numerical comparisons suggest that RV-SIS has competitive performance compared to other screening procedures, and outperforms them in many different model settings. 展开更多
关键词 sure independence screening Nonparametric Regression Ultrahigh-Dimensional Data Variable Selection
下载PDF
Ultra-High Dimensional Feature Selection and Mean Estimation under Missing at Random
3
作者 Wanhui Li Guangming Deng Dong Pan 《Open Journal of Statistics》 2023年第6期850-871,共22页
Next Generation Sequencing (NGS) provides an effective basis for estimating the survival time of cancer patients, but it also poses the problem of high data dimensionality, in addition to the fact that some patients d... Next Generation Sequencing (NGS) provides an effective basis for estimating the survival time of cancer patients, but it also poses the problem of high data dimensionality, in addition to the fact that some patients drop out of the study, making the data missing, so a method for estimating the mean of the response variable with missing values for the ultra-high dimensional datasets is needed. In this paper, we propose a two-stage ultra-high dimensional variable screening method, RF-SIS, based on random forest regression, which effectively solves the problem of estimating missing values due to excessive data dimension. After the dimension reduction process by applying RF-SIS, mean interpolation is executed on the missing responses. The results of the simulated data show that compared with the estimation method of directly deleting missing observations, the estimation results of RF-SIS-MI have significant advantages in terms of the proportion of intervals covered, the average length of intervals, and the average absolute deviation. 展开更多
关键词 Ultrahigh-Dimensional Data Missing Data sure Independent screening Mean Estimation
下载PDF
Volatile Compounds Selection via Quantile Correlation and Composite Quantile Correlation: A Whiting Case Study 被引量:1
4
作者 Ibrahim Sidi Zakari Assi N’guessan +1 位作者 Alexandre Dehaut Guillaume Duflos 《Open Journal of Statistics》 2016年第6期995-1002,共9页
The freshness and quality indices of whiting (Merlangius merlangus) influenced by a large number of chemical volatile compounds, are here analyzed in order to select the most relevant compounds as predictors for these... The freshness and quality indices of whiting (Merlangius merlangus) influenced by a large number of chemical volatile compounds, are here analyzed in order to select the most relevant compounds as predictors for these indices. The selection process was performed by means of recent statistical variable selection methods, namely robust model-free feature screening, based on quantile correlation and composite quantile correlation. On the one hand, compounds 2-Methyl-1-butanol, 3-Methyl-1-butanol, Ethanol, Trimethylamine, 3-Methyl butanal, 2-Methyl-1-propanol, Ethylacetate, 1-Butanol and 2,3-Butanedione were identified as major predictors for the freshness index and on the other hand, compounds 3-Methyl-1-butanol, 2-Methyl-1- butanol, Ethanol, 3-Methyl butanal, 3-Hydroxy-2-butanone, 1-Butanol, 2,3-Butane- dione, 3-Pentanol, 3-Pentanone and 2-Methyl-1-propanol were identified as major predictors for the quality index. 展开更多
关键词 Volatile Compounds Freshness and Spoilage Indices Quantile Correlation Composite Quantile Correlation sure independence screening
下载PDF
A selective overview of feature screening for ultrahigh-dimensional data 被引量:10
5
作者 LIU JingYuan ZHONG Wei LI RunZe 《Science China Mathematics》 SCIE CSCD 2015年第10期2033-2054,共22页
High-dimensional data have frequently been collected in many scientific areas including genomewide association study, biomedical imaging, tomography, tumor classifications, and finance. Analysis of highdimensional dat... High-dimensional data have frequently been collected in many scientific areas including genomewide association study, biomedical imaging, tomography, tumor classifications, and finance. Analysis of highdimensional data poses many challenges for statisticians. Feature selection and variable selection are fundamental for high-dimensional data analysis. The sparsity principle, which assumes that only a small number of predictors contribute to the response, is frequently adopted and deemed useful in the analysis of high-dimensional data.Following this general principle, a large number of variable selection approaches via penalized least squares or likelihood have been developed in the recent literature to estimate a sparse model and select significant variables simultaneously. While the penalized variable selection methods have been successfully applied in many highdimensional analyses, modern applications in areas such as genomics and proteomics push the dimensionality of data to an even larger scale, where the dimension of data may grow exponentially with the sample size. This has been called ultrahigh-dimensional data in the literature. This work aims to present a selective overview of feature screening procedures for ultrahigh-dimensional data. We focus on insights into how to construct marginal utilities for feature screening on specific models and motivation for the need of model-free feature screening procedures. 展开更多
关键词 correlation learning distance correlation sure independence screening sure joint screening sure screening property ultrahigh-dim
原文传递
Model-free feature screening for high-dimensional survival data 被引量:2
6
作者 Yuanyuan Lin Xianhui Liu Meiling Hao 《Science China Mathematics》 SCIE CSCD 2018年第9期1617-1636,共20页
With the rapid-growth-in-size scientific data in various disciplines, feature screening plays an important role to reduce the high-dimensionality to a moderate scale in many scientific fields. In this paper, we introd... With the rapid-growth-in-size scientific data in various disciplines, feature screening plays an important role to reduce the high-dimensionality to a moderate scale in many scientific fields. In this paper, we introduce a unified and robust model-free feature screening approach for high-dimensional survival data with censoring, which has several advantages: it is a model-free approach under a general model framework, and hence avoids the complication to specify an actual model form with huge number of candidate variables; under mild conditions without requiring the existence of any moment of the response, it enjoys the ranking consistency and sure screening properties in ultra-high dimension. In particular, we impose a conditional independence assumption of the response and the censoring variable given each covariate, instead of assuming the censoring variable is independent of the response and the covariates. Moreover, we also propose a more robust variant to the new procedure, which possesses desirable theoretical properties without any finite moment condition of the predictors and the response. The computation of the newly proposed methods does not require any complicated numerical optimization and it is fast and easy to implement. Extensive numerical studies demonstrate that the proposed methods perform competitively for various configurations. Application is illustrated with an analysis of a genetic data set. 展开更多
关键词 feature screening random censoring robustness sure independence screening ultra-high dimension
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部