摘要
在生物医学、临床试验和流行病学等领域的研究中,由于获得生存数据的试验设计、观测时间的局限,以及观测对象在进入或退出试验时的个体差异等方面的原因,与所关注事件的发生时间相关的数据经常存在右删失。基于右删失生存数据解析协变量和生存时间的关系时,应用最为广泛的统计模型是Cox模型。随着科学技术的进步,数据收集变得越来越容易,导致数据库规模越来越大、复杂性越来越高,数据的维度通常可以达到成百上千维,甚至更高。文章提出一种Cox模型中基于Model-X Knockoffs的高维控制变量选择方法。首先基于Knockoffs框架建立一个Knockoffs变量,并基于原始协变量和其相应的Knockoffs变量构造一个正则化的目标函数,然后通过求解目标函数的最优解构造一个统计量和基于数据的阈值,最后进行变量选择。模拟分析和实证研究结果表明:所提方法可以在变量选择的同时提供可靠的FDR控制,优于传统的LASSO方法。
In the research of biomedicine, clinical trials, epidemiology and so on, due to the limitation of experimental design of obtaining survival data, the limitation of observation time, as well as the individual differences of observation objects in time of entering or exiting the experiment, data related to the occurrence time of the event of concern are often right-censored.Cox model is the most widely used statistical model for analyzing the relationship between covariates and lifetime based on right-censored survival data. With the advancement of science and technology, data collection has become easier, which results in larger and more complex databases, and the dimension of data can often reach thousands of dimensions or even higher. This paper proposes a high-dimensional control variable selection method based on Model-X Knockoffs in Cox model. Firstly, a Knockoffs variable is established based on the Knockoffs framework, and a regularized objective function is constructed based on the original covariate and its corresponding Knockoffs variable. Then, a statistic and a data-based threshold are constructed by solving the optimal solution of the objective function. Finally, variables are selected. Simulation analysis and empirical research results show that the proposed method can provide reliable FDR control while variables are being selected, which is superior to the traditional LASSO method.
作者
黄河
潘莹丽
Huang He;Pan Yingli(Faculty of Management,Wuzhou University,Wuzhou Guangxi 543002,China;Faculty of Mathematics and Statistics,Hubei University,Wuhan 430062,China;Hubei Key Laboratory of Applied Mathematics,Hubei University,Wuhan 430062,China)
出处
《统计与决策》
北大核心
2023年第5期16-21,共6页
Statistics & Decision
基金
国家自然科学基金资助项目(11901175)。