期刊文献+
共找到6篇文章
< 1 >
每页显示 20 50 100
Model-Free Feature Screening via Maximal Information Coefficient (MIC) for Ultrahigh-Dimensional Multiclass Classification
1
作者 Tingting Chen Guangming Deng 《Open Journal of Statistics》 2023年第6期917-940,共24页
It is common for datasets to contain both categorical and continuous variables. However, many feature screening methods designed for high-dimensional classification assume that the variables are continuous. This limit... It is common for datasets to contain both categorical and continuous variables. However, many feature screening methods designed for high-dimensional classification assume that the variables are continuous. This limits the applicability of existing methods in handling this complex scenario. To address this issue, we propose a model-free feature screening approach for ultra-high-dimensional multi-classification that can handle both categorical and continuous variables. Our proposed feature screening method utilizes the Maximal Information Coefficient to assess the predictive power of the variables. By satisfying certain regularity conditions, we have proven that our screening procedure possesses the sure screening property and ranking consistency properties. To validate the effectiveness of our approach, we conduct simulation studies and provide real data analysis examples to demonstrate its performance in finite samples. In summary, our proposed method offers a solution for effectively screening features in ultra-high-dimensional datasets with a mixture of categorical and continuous covariates. 展开更多
关键词 ultrahigh-dimensional Feature Screening MODEL-FREE Maximal Information Coefficient (MIC) Multiclass Classification
下载PDF
Model-Free Feature Screening Based on Gini Impurity for Ultrahigh-Dimensional Multiclass Classification
2
作者 Zhongzheng Wang Guangming Deng 《Open Journal of Statistics》 2022年第5期711-732,共22页
It is quite common that both categorical and continuous covariates appear in the data. But, most feature screening methods for ultrahigh-dimensional classification assume the covariates are continuous. And applicable ... It is quite common that both categorical and continuous covariates appear in the data. But, most feature screening methods for ultrahigh-dimensional classification assume the covariates are continuous. And applicable feature screening method is very limited;to handle this non-trivial situation, we propose a model-free feature screening for ultrahigh-dimensional multi-classification with both categorical and continuous covariates. The proposed feature screening method will be based on Gini impurity to evaluate the prediction power of covariates. Under certain regularity conditions, it is proved that the proposed screening procedure possesses the sure screening property and ranking consistency properties. We demonstrate the finite sample performance of the proposed procedure by simulation studies and illustrate using real data analysis. 展开更多
关键词 ultrahigh-dimensional Feature Screening MODEL-FREE Gini Impurity Multiclass Classification
下载PDF
Semiparametric Model Averaging for Ultrahigh-Dimensional Conditional Quantile Prediction
3
作者 Chao Hui GUO Jing LV +2 位作者 Hu YANG Jing Wen TU Chen Xiao TIAN 《Acta Mathematica Sinica,English Series》 SCIE CSCD 2023年第6期1171-1202,共32页
In this paper,we develop a flexible semiparametric model averaging marginal regression procedure to forecast the joint conditional quantile function of the response variable for ultrahighdimensional data.First,we appr... In this paper,we develop a flexible semiparametric model averaging marginal regression procedure to forecast the joint conditional quantile function of the response variable for ultrahighdimensional data.First,we approximate the joint conditional quantile function by a weighted average of one-dimensional marginal conditional quantile functions that have varying coefficient structures.Then,a local linear regression technique is employed to derive the consistent estimates of marginal conditional quantile functions.Second,based on estimated marginal conditional quantile functions,we estimate and select the significant model weights involved in the approximation by a nonconvex penalized quantile regression.Under some relaxed conditions,we establish the asymptotic properties for the nonparametric kernel estimators and oracle estimators of the model averaging weights.We further derive the oracle property for the proposed nonconvex penalized model averaging procedure.Finally,simulation studies and a real data analysis are conducted to illustrate the merits of our proposed model averaging method. 展开更多
关键词 Kernel regression model averaging oracle property penalized quantile regression ultrahigh-dimension data
原文传递
Conditional-quantile screening for ultrahigh-dimensional survival data via martingale difference correlation 被引量:1
4
作者 Kai Xu Xudong Huang 《Science China Mathematics》 SCIE CSCD 2018年第10期1907-1922,共16页
Using the so-called martingale difference correlation(MDC), we propose a novel censoredconditional-quantile screening approach for ultrahigh-dimensional survival data with heterogeneity(which is often present in such ... Using the so-called martingale difference correlation(MDC), we propose a novel censoredconditional-quantile screening approach for ultrahigh-dimensional survival data with heterogeneity(which is often present in such data). By incorporating a weighting scheme, this method is a natural extension of MDCbased conditional quantile screening, as considered by Shao and Zhang(2014), to handle ultrahigh-dimensional survival data. The proposed screening procedure has a sure-screening property under certain technical conditions and an excellent capability of detecting the nonlinear relationship between independent and censored dependent variables. Both simulation results and an analysis of real data demonstrate the effectiveness of the new censored conditional quantile-screening procedure. 展开更多
关键词 ultrahigh-dimensional survival data martingale difference correlation censored-conditional-quantile screening sure-screening property
原文传递
Nonparametric Feature Screening via the Variance of the Regression Function
5
作者 Won Chul Song Michael G. Akritas 《Open Journal of Statistics》 2024年第4期413-438,共26页
This article develops a procedure for screening variables, in ultra high-di- mensional settings, based on their predictive significance. This is achieved by ranking the variables according to the variance of their res... This article develops a procedure for screening variables, in ultra high-di- mensional settings, based on their predictive significance. This is achieved by ranking the variables according to the variance of their respective marginal regression functions (RV-SIS). We show that, under some mild technical conditions, the RV-SIS possesses a sure screening property, which is defined by Fan and Lv (2008). Numerical comparisons suggest that RV-SIS has competitive performance compared to other screening procedures, and outperforms them in many different model settings. 展开更多
关键词 Sure Independence Screening Nonparametric Regression ultrahigh-dimensional Data Variable Selection
下载PDF
Ultra-High Dimensional Feature Selection and Mean Estimation under Missing at Random
6
作者 Wanhui Li Guangming Deng Dong Pan 《Open Journal of Statistics》 2023年第6期850-871,共22页
Next Generation Sequencing (NGS) provides an effective basis for estimating the survival time of cancer patients, but it also poses the problem of high data dimensionality, in addition to the fact that some patients d... Next Generation Sequencing (NGS) provides an effective basis for estimating the survival time of cancer patients, but it also poses the problem of high data dimensionality, in addition to the fact that some patients drop out of the study, making the data missing, so a method for estimating the mean of the response variable with missing values for the ultra-high dimensional datasets is needed. In this paper, we propose a two-stage ultra-high dimensional variable screening method, RF-SIS, based on random forest regression, which effectively solves the problem of estimating missing values due to excessive data dimension. After the dimension reduction process by applying RF-SIS, mean interpolation is executed on the missing responses. The results of the simulated data show that compared with the estimation method of directly deleting missing observations, the estimation results of RF-SIS-MI have significant advantages in terms of the proportion of intervals covered, the average length of intervals, and the average absolute deviation. 展开更多
关键词 ultrahigh-dimensional Data Missing Data Sure Independent Screening Mean Estimation
下载PDF
上一页 1 下一页 到第
使用帮助 返回顶部