期刊文献+

超高维删失数据的联合特征筛选方法研究 被引量:3

Joint feature screening method for ultrahigh dimensional censored data
原文传递
导出
摘要 针对超高维删失数据,通过降维技术可以进行特征选取,去除大数据中的噪声数据,以便挖掘高维大数据的重要信息,进行大数据的相关分析和应用.本文提出了一种稳健的偏相关系数来进行特征筛选,并引入逆概率加权方法来处理删失,发展出一种新的联合特征筛选方法.本文利用响应变量的条件分布函数来构造偏相关性度量,可以全面地刻画其与协变量间的相关性,且相较于传统的皮尔逊偏相关系数,该度量对于响应存在异常值,厚尾分布以及异方差结构时具有稳健性.其次,基于该度量所提出的联合特征筛选方法通过投影作用来消除由协变量之间的相关关系产生的干扰作用,故能够较好地改善假阴性错误、假阳性错误及协变量的共线性问题.我们推导了该方法的理论性质,给出了快速的迭代算法,并进一步通过模拟和实例分析来考察该算法在有限样本下的数值表现. For ultra-high-dimensional censored data, feature screening can be performed to remove noise in big data, and classical statistic analysis can be applied after that. This paper proposes a robust partial correlation coefficient for feature screening, and introduces an inverse probability weighting method to deal with censoring. Based on that, a new joint feature screening method is developed. By incorporating the information of the entire conditional distribution of the failure time, our method can depict the relationship between the response and covariates comprehensively. Compared with the traditional Pearson partial correlation coefficient, this measurement is robust to outliers, heavy-tailed distribution and heteroscedasticity. Moreover,the joint feature screening method proposed based on this metric eliminates the interference caused by the correlation between the covariates through the projection effect, so as to reduce the false negative errors, false positive errors and tackle the problem of collinearity of covariates. We establish the sure screening property of our method and give the details of the iterative algorithm.The competence of our method is further confirmed through comprehensive simulation studies and a real data example.
作者 潘婧 柴洪峰 孙权 周勇 PAN Jing;CHAI Hongfeng;SUN Quan;ZHOU Yong(Fintech Research Institute,China UnionPay,Shanghai 201201,China;FinTech Research Institute,Fudan University,Shanghai 200433,China;School of Computer Science,Fudan University,Shanghai 200433,China;MOE Key Laboratory of Advanced Theory and Applications in Statistics and Data Science,School of Statistics and Academy of Statistics and Interdisciplinary Sciences,East China Normal University,Shanghai 200062,China)
出处 《系统工程理论与实践》 EI CSCD 北大核心 2023年第1期169-190,共22页 Systems Engineering-Theory & Practice
基金 科技部国家重点研发计划(2021YFA1000101,2021YFA1000102,2021YFA1000104) 国家自然科学基金重点项目(71931004) 培育基金(92046005) 国家自然科学基金重大研究计划重点支持项目(92046024) 国家自然科学基金重大研究计划集成项目(92146002)。
关键词 超高维删失数据 特征筛选 偏相关系数 逆概率加权估计 稳健性 ultrahigh dimensional censored data feature screening partial correlation coefficient inverse probability weighting estimation robustness
  • 相关文献

参考文献2

二级参考文献11

共引文献6

同被引文献34

引证文献3

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部