摘要
在有限维参数刻画的误差空间协方差矩阵下,针对带有高维协变量的地理空间线性回归模型的变量选择和参数估计,提出了基于惩罚最小二乘的自适应惩罚最大似然估计算法。给出了维数发散时的两种类型的理论性质刻画——分别是维数发散,但比样本量小时的参数估计的误差收敛速度和稀疏相合性;在维数远远大于样本量时,使用了“主项-对偶项见证”(prime-dual witness)技术得到高维时(p>>n)的非渐近结果的误差收敛速度和模型选择符号相合性。我们发现,在对空间相关矩阵假定某个类(如Matern类)时,若该类待估参数个数有限,则高维协变量的模型选择和参数估计的结果与样本独立时的结果是一致的。通过随机模拟证明了本文使用的坐标下降求解算法的有效性。在一个世界范围内69家实验室的拟南芥的基因型(SNP)与花开时长等表型数据上应用了本文方法进行花开时长表型预测,验证了方法的适用性和优越性。
In high dimensional spatial data analysis,we consider the problem of selecting covariates and estimating parameters in spatial linear models with Gaussian process errors.When the problem is of fixed dimension,namely,with fixed number of covariates,considered the penalized maximum likelihood estimation(PMLE)and proposed a one-step sparse estimator,in which consistency and oracle property are obtained.Here we propose a spatial penalized maximum likelihood estimator with high dimensional covariates.The optimization is carried out through a coordinate descent algorithm.The convergence rate for parameters'estimation and sparsistency of model selection are obtained for the diverging dimension case.Furthermore,a primal-dual witness based argument leads to a non-asymptotic result on the estimation and model selection consistency for the p>n high dimensional case.Monte Carlo results show the proposed methods'better performance than other competitors,and a real GWAS for SNP data and many phenotype of spatially distributed cell-line data is analyzed and shown the discovery under geostatistical model.
作者
褚挺进
华雨臻
丁一鸣
尹建鑫
CHU Ting-jin;HUA Yu-zhen;DING Yi-ming;YIN Jian-xin(School of Mathematics and Statistics,University of Melbourne,Australia Melbourne 3010,Australia;Meituan,Beijing 100102,China;School of Mathematics,Renmin University of China,Beijing 100872,China;Center for Applied Statistics of Renmin University of China,Beijing 100872,China;School of Statistics,Renmin University of China,Beijing 100872,China)
出处
《数理统计与管理》
CSSCI
北大核心
2024年第3期407-422,共16页
Journal of Applied Statistics and Management
基金
教育部人文社会科学重点研究基地重大项目(22JJD110001)。
关键词
地理空间统计
高维数据分析
惩罚似然估计
主项-对偶项见证
坐标下降算法
spatial statistics
high dimensional data analysis
penalized maximum likelihood estimation
primal-dual witness
coordinate descent algorithm