期刊文献+

基于一种距离相关的超高维生存数据Model-Free特征筛选

Model-Free feature screening via a distance correlation for ultrahigh dimensional survival data
下载PDF
导出
摘要 随着大数据时代的来临,数据维度爆炸式增长,超高维数据的降维问题逐渐成为众多研究领域的热点话题。由于响应变量通常存在右删失,处理超高维完全数据的降维方法在右删失数据中将不再适用。本研究提出一种新的基于距离相关能有效处理超高维右删失数据的特征筛选方法。首先利用距离相关系数计算每个协变量对响应变量的边际效应,建立与该系数有关的筛选指标,然后再根据事先确立的筛选准则进行特征筛选。提出的特征筛选方法不依赖任何模型结构假定,因此可以有效避免模型指定错误带来的不良后果。此外,该方法采用的距离协方差估计量是总体距离协方差的一个无偏估计,统计准确性和计算精度高。模拟和实证研究表明,提出的方法能在保留所有重要变量的前提下快速剔除与响应变量相关程度较弱的协变量,从而达到降低参数维数的目的。 With the approaching of the era of big data and the explosive growth of data dimensionality,how to reduce the dimensionality of ultrahigh dimensional data increasingly becomes a major topic in some research fields.Dimensionality reduction methods which are used to deal with ultrahigh dimensional complete data are no longer applicable to censored data when response variable can’t be accurately observed.In order to effectively deal with ultrahigh dimensional censored data,we proposed a feature screening method based on distance correlation.We firstly calculate the marginal effect between each covariate and response variable based on distance correlation coefficients.Then we established related screening indexs.Finally,we performed feature screening procedure on the data according to the pre-established screening criteria.This method doesn’t depend on any model structure assumptions,so it can effectively avoid all kinds of bad consequences caused by model specification errors.In addition,the estimator of distance covariance used in this method is an unbiased estimator of the total covariance,thus it has high statistical accuracy and calculation precision.Simulation and empirical studies show that our proposed method can quickly eliminate the covariates with weak correlation with the response variable while preserving all the important variables,so as to achieve the goal of reducing the dimension of the parameter.
作者 潘莹丽 王昊宇 喻佳丽 刘展 PAN Yingli;WANG Haoyu;YU Jiali;LIU Zhan(Hubei Key Laboratory of Applied Mathematics,Faculty of Mathematics and Statistics,Hubei University,Wuhan 430062,China)
出处 《湖北大学学报(自然科学版)》 CAS 2024年第1期122-132,共11页 Journal of Hubei University:Natural Science
基金 国家自然科学基金青年项目“基于尾期望回归模型的分布式优化方法研究”(11901175)资助。
关键词 超高维数据 生存数据 距离相关 Model-Free特征筛选 ultrahigh dimensional data survival data distance correlation Model-Free feature screening
  • 相关文献

参考文献3

二级参考文献1

共引文献4

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部