We study the asymptotic properties of adaptive lasso estimators when some components of the parameter of interest βare strictly different than zero, while other components may be zero or may converge to zero ...We study the asymptotic properties of adaptive lasso estimators when some components of the parameter of interest βare strictly different than zero, while other components may be zero or may converge to zero with rate n<sup>-δ</sup>, with δ>0, where n denotes the sample size. To achieve this objective, we analyze the convergence/divergence rates of each term in the first-order conditions of adaptive lasso estimators. First, we derive conditions that allow selecting tuning parameters in order to ensure that adaptive lasso estimates of n<sup>-δ</sup>-components indeed collapse to zero. Second, in this case, we also derive asymptotic distributions of adaptive lasso estimators for nonzero components. When δ>1/2, we obtain the usual n<sup>1/2</sup>-asymptotic normal distribution, while when 0δ≤1/2, we show n<sup>δ</sup>-consistency combined with (biased) n<sup>1/2-δ</sup>-asymptotic normality for nonzero components. We call these properties, Extended Oracle Properties. These results allow practitioners to exclude in their model the asymptotically negligible variables and make inferences on the asymptotically relevant variables.展开更多
The seamless-L0 (SELO) penalty is a smooth function on [0, ∞) that very closely resembles the L0 penalty, which has been demonstrated theoretically and practically to be effective in nonconvex penalization for var...The seamless-L0 (SELO) penalty is a smooth function on [0, ∞) that very closely resembles the L0 penalty, which has been demonstrated theoretically and practically to be effective in nonconvex penalization for variable selection. In this paper, we first generalize SELO to a class of penalties retaining good features of SELO, and then propose variable selection and estimation in linear models using the proposed generalized SELO (GSELO) penalized least squares (PLS) approach. We show that the GSELO-PLS procedure possesses the oracle property and consistently selects the true model under some regularity conditions in the presence of a diverging number of variables. The entire path of GSELO-PLS estimates can be efficiently computed through a smoothing quasi-Newton (SQN) method. A modified BIC coupled with a continuation strategy is developed to select the optimal tuning parameter. Simulation studies and analysis of a clinical data are carried out to evaluate the finite sample performance of the proposed method. In addition, numerical experiments involving simulation studies and analysis of a microarray data are also conducted for GSELO-PLS in the high-dimensional settings.展开更多
This paper discusses the asymptotic properties of the SCAD(smoothing clipped absolute deviation)penalized quasi-likelihood estimator for generalized linear models with adaptive designs,which extend the related results...This paper discusses the asymptotic properties of the SCAD(smoothing clipped absolute deviation)penalized quasi-likelihood estimator for generalized linear models with adaptive designs,which extend the related results for independent observations to dependent observations.Under certain conditions,the authors proved that the SCAD penalized method correctly selects covariates with nonzero coefficients with probability converging to one,and the penalized quasi-likelihood estimators of non-zero coefficients have the same asymptotic distribution they would have if the zero coefficients were known in advance.That is,the SCAD estimator has consistency and oracle properties.At last,the results are illustrated by some simulations.展开更多
For analyzing correlated binary data with high-dimensional covariates,we,in this paper,propose a two-stage shrinkage approach.First,we construct a weighted least-squares(WLS) type function using a special weighting sc...For analyzing correlated binary data with high-dimensional covariates,we,in this paper,propose a two-stage shrinkage approach.First,we construct a weighted least-squares(WLS) type function using a special weighting scheme on the non-conservative vector field of the generalized estimating equations(GEE) model.Second,we define a penalized WLS in the spirit of the adaptive LASSO for simultaneous variable selection and parameter estimation.The proposed procedure enjoys the oracle properties in high-dimensional framework where the number of parameters grows to infinity with the number of clusters.Moreover,we prove the consistency of the sandwich formula of the covariance matrix even when the working correlation matrix is misspecified.For the selection of tuning parameter,we develop a consistent penalized quadratic form(PQF) function criterion.The performance of the proposed method is assessed through a comparison with the existing methods and through an application to a crossover trial in a pain relief study.展开更多
In this paper, we investigate the variable selection problem of the generalized regression models. To estimate the regression parameter, a procedure combining the rank correlation method and the adaptive lasso techniq...In this paper, we investigate the variable selection problem of the generalized regression models. To estimate the regression parameter, a procedure combining the rank correlation method and the adaptive lasso technique is developed, which is proved to have oracle properties. A modified IMO (iterative marginal optimization) algorithm which directly aims to maximize the penalized rank correlation function is proposed. The effects of the estimating procedure are illustrated by simulation studies.展开更多
The integer-valued generalized autoregressive conditional heteroskedastic(INGARCH)model is often utilized to describe data in biostatistics,such as the number of people infected with dengue fever,daily epileptic seizu...The integer-valued generalized autoregressive conditional heteroskedastic(INGARCH)model is often utilized to describe data in biostatistics,such as the number of people infected with dengue fever,daily epileptic seizure counts of an epileptic patient and the number of cases of campylobacterosis infections,etc.Since the structure of such data is generally high-order and sparse,studies about order shrinkage and selection for the model attract many attentions.In this paper,we propose a penalized conditional maximum likelihood(PCML)method to solve this problem.The PCML method can effectively select significant orders and estimate the parameters,simultaneously.Some simulations and a real data analysis are carried out to illustrate the usefulness of our method.展开更多
In this paper,the authors investigate three aspects of statistical inference for the partially linear regression models where some covariates are measured with errors.Firstly, a bandwidth selection procedure is propos...In this paper,the authors investigate three aspects of statistical inference for the partially linear regression models where some covariates are measured with errors.Firstly, a bandwidth selection procedure is proposed,which is a combination of the differencebased technique and GCV method.Secondly,a goodness-of-fit test procedure is proposed, which is an extension of the generalized likelihood technique.Thirdly,a variable selection procedure for the parametric part is provided based on the nonconcave penalization and corrected profile least squares.Same as"Variable selection via nonconcave penalized likelihood and its oracle properties"(J.Amer.Statist.Assoc.,96,2001,1348-1360),it is shown that the resulting estimator has an oracle property with a proper choice of regularization parameters and penalty function.Simulation studies are conducted to illustrate the finite sample performances of the proposed procedures.展开更多
In this paper, based on spline approximation, the authors propose a unified variable selection approach for single-index model via adaptive L1 penalty. The calculation methods of the proposed estimators are given on t...In this paper, based on spline approximation, the authors propose a unified variable selection approach for single-index model via adaptive L1 penalty. The calculation methods of the proposed estimators are given on the basis of the known lars algorithm. Under some regular conditions, the authors demonstrate the asymptotic properties of the proposed estimators and the oracle properties of adaptive LASSO(aL ASSO) variable selection. Simulations are used to investigate the performances of the proposed estimator and illustrate that it is effective for simultaneous variable selection as well as estimation of the single-index models.展开更多
This paper considers variable selection for moment restriction models. We propose a penalized empirical likelihood (PEL) approach that has desirable asymptotic properties comparable to the penalized likelihood appro...This paper considers variable selection for moment restriction models. We propose a penalized empirical likelihood (PEL) approach that has desirable asymptotic properties comparable to the penalized likelihood approach, which relies on a correct parametric likelihood specification. In addition to being consistent and having the oracle property, PEL admits inference on parameter without having to estimate its estimator's covariance. An approximate algorithm, along with a consistent BIC-type criterion for selecting the tuning parameters, is provided for FEL. The proposed algorithm enjoys considerable computational efficiency and overcomes the drawback of the local quadratic approximation of nonconcave penalties. Simulation studies to evaluate and compare the performances of our method with those of the existing ones show that PEL is competitive and robust. The proposed method is illustrated with two real examples.展开更多
We consider the problem of variable selection for single-index varying-coefficient model, and present a regularized variable selection procedure by combining basis function approximations with SCAD penalty. The propos...We consider the problem of variable selection for single-index varying-coefficient model, and present a regularized variable selection procedure by combining basis function approximations with SCAD penalty. The proposed procedure simultaneously selects significant covariates with functional coefficients and local significant variables with parametric coefficients. With appropriate selection of the tuning parameters, the consistency of the variable selection procedure and the oracle property of the estimators are established. The proposed method can naturally be applied to deal with pure single-index model and varying-coefficient model. Finite sample performances of the proposed method are illustrated by a simulation study and the real data analysis.展开更多
The varying-coefficient model is flexible and powerful for modeling the dynamic changes of regression coefficients. We study the problem of variable selection and estimation in this model in the sparse, high- dimensio...The varying-coefficient model is flexible and powerful for modeling the dynamic changes of regression coefficients. We study the problem of variable selection and estimation in this model in the sparse, high- dimensional case. We develop a concave group selection approach for this problem using basis function expansion and study its theoretical and empirical properties. We also apply the group Lasso for variable selection and estimation in this model and study its properties. Under appropriate conditions, we show that the group least absolute shrinkage and selection operator (Lasso) selects a model whose dimension is comparable to the underlying mode], regardless of the large number of unimportant variables. In order to improve the selection results, we show that the group minimax concave penalty (MCP) has the oracle selection property in the sense that it correctly selects important variables with probability converging to one under suitable conditions. By comparison, the group Lasso does not have the oracle selection property. In the simulation parts, we apply the group Lasso and the group MCP. At the same time, the two approaches are evaluated using simulation and demonstrated on a data example.展开更多
In this puper, we consider the problem of variabie selection and model detection in varying coefficient models with longitudinM data. We propose a combined penalization procedure to select the significant variables, d...In this puper, we consider the problem of variabie selection and model detection in varying coefficient models with longitudinM data. We propose a combined penalization procedure to select the significant variables, detect the true structure of the model and estimate the unknown regression coefficients simultaneously. With appropriate selection of the tuning parameters, we show that the proposed procedure is consistent in both variable selection and the separation of varying and constant coefficients, and the penalized estimators have the oracle property. Finite sample performances of the proposed method are illustrated by some simulation studies and the real data analysis.展开更多
In this paper,we focus on the partially linear varying-coefficient quantile regression with missing observations under ultra-high dimension,where the missing observations include either responses or covariates or the ...In this paper,we focus on the partially linear varying-coefficient quantile regression with missing observations under ultra-high dimension,where the missing observations include either responses or covariates or the responses and part of the covariates are missing at random,and the ultra-high dimension implies that the dimension of parameter is much larger than sample size.Based on the B-spline method for the varying coefficient functions,we study the consistency of the oracle estimator which is obtained only using active covariates whose coefficients are nonzero.At the same time,we discuss the asymptotic normality of the oracle estimator for the linear parameter.Note that the active covariates are unknown in practice,non-convex penalized estimator is investigated for simultaneous variable selection and estimation,whose oracle property is also established.Finite sample behavior of the proposed methods is investigated via simulations and real data analysis.展开更多
In practice, predictors possess grouping structures spontaneously. Incorporation of such useful information can improve statistical modeling and inference. In addition, the high-dimensionality often leads to the colli...In practice, predictors possess grouping structures spontaneously. Incorporation of such useful information can improve statistical modeling and inference. In addition, the high-dimensionality often leads to the collinearity problem. The elastic net is an ideal method which is inclined to reflect a grouping effect. In this paper, we consider the problem of group selection and estimation in the sparse linear regression model in which predictors can be grouped. We investigate a group adaptive elastic-net and derive oracle inequalities and model consistency for the cases where group number is larger than the sample size. Oracle property is addressed for the case of the fixed group number. We revise the locally approximated coordinate descent algorithm to make our computation. Simulation and real data studies indicate that the group adaptive elastic-net is an alternative and competitive method for model selection of high-dimensional problems for the cases of group number being larger than the sample size.展开更多
This paper employs the SCAD-penalized least squares method to simultaneously select variables and estimate the coefficients for high-dimensional covariate adjusted linear regression models.The distorted variables are ...This paper employs the SCAD-penalized least squares method to simultaneously select variables and estimate the coefficients for high-dimensional covariate adjusted linear regression models.The distorted variables are assumed to be contaminated with a multiplicative factor that is determined by the value of an unknown function of an observable covariate.The authors show that under some appropriate conditions,the SCAD-penalized least squares estimator has the so called "oracle property".In addition,the authors also suggest a BIC criterion to select the tuning parameter,and show that BIC criterion is able to identify the true model consistently for the covariate adjusted linear regression models.Simulation studies and a real data are used to illustrate the efficiency of the proposed estimation algorithm.展开更多
In statistics and machine learning communities, the last fifteen years have witnessed a surge of high-dimensional models backed by penalized methods and other state-of-the-art variable selection techniques.The high-di...In statistics and machine learning communities, the last fifteen years have witnessed a surge of high-dimensional models backed by penalized methods and other state-of-the-art variable selection techniques.The high-dimensional models we refer to differ from conventional models in that the number of all parameters p and number of significant parameters s are both allowed to grow with the sample size T. When the field-specific knowledge is preliminary and in view of recent and potential affluence of data from genetics, finance and on-line social networks, etc., such(s, T, p)-triply diverging models enjoy ultimate flexibility in terms of modeling, and they can be used as a data-guided first step of investigation. However, model selection consistency and other theoretical properties were addressed only for independent data, leaving time series largely uncovered. On a simple linear regression model endowed with a weakly dependent sequence, this paper applies a penalized least squares(PLS) approach. Under regularity conditions, we show sign consistency, derive finite sample bound with high probability for estimation error, and prove that PLS estimate is consistent in L_2 norm with rate (s log s/T)~1/2.展开更多
In this paper,we develop a flexible semiparametric model averaging marginal regression procedure to forecast the joint conditional quantile function of the response variable for ultrahighdimensional data.First,we appr...In this paper,we develop a flexible semiparametric model averaging marginal regression procedure to forecast the joint conditional quantile function of the response variable for ultrahighdimensional data.First,we approximate the joint conditional quantile function by a weighted average of one-dimensional marginal conditional quantile functions that have varying coefficient structures.Then,a local linear regression technique is employed to derive the consistent estimates of marginal conditional quantile functions.Second,based on estimated marginal conditional quantile functions,we estimate and select the significant model weights involved in the approximation by a nonconvex penalized quantile regression.Under some relaxed conditions,we establish the asymptotic properties for the nonparametric kernel estimators and oracle estimators of the model averaging weights.We further derive the oracle property for the proposed nonconvex penalized model averaging procedure.Finally,simulation studies and a real data analysis are conducted to illustrate the merits of our proposed model averaging method.展开更多
When there are outliers or heavy-tailed distributions in the data, the traditional least squares with penalty function is no longer applicable. In addition, with the rapid development of science and technology, a lot ...When there are outliers or heavy-tailed distributions in the data, the traditional least squares with penalty function is no longer applicable. In addition, with the rapid development of science and technology, a lot of data, enjoying high dimension, strong correlation and redundancy, has been generated in real life. So it is necessary to find an effective variable selection method for dealing with collinearity based on the robust method. This paper proposes a penalized M-estimation method based on standard error adjusted adaptive elastic-net, which uses M-estimators and the corresponding standard errors as weights. The consistency and asymptotic normality of this method are proved theoretically. For the regularization in high-dimensional space, the authors use the multi-step adaptive elastic-net to reduce the dimension to a relatively large scale which is less than the sample size, and then use the proposed method to select variables and estimate parameters. Finally, the authors carry out simulation studies and two real data analysis to examine the finite sample performance of the proposed method. The results show that the proposed method has some advantages over other commonly used methods.展开更多
In many applications,covariates can be naturally grouped.For example,for gene expression data analysis,genes belonging to the same pathway might be viewed as a group.This paper studies variable selection problem for c...In many applications,covariates can be naturally grouped.For example,for gene expression data analysis,genes belonging to the same pathway might be viewed as a group.This paper studies variable selection problem for censored survival data in the additive hazards model when covariates are grouped.A hierarchical regularization method is proposed to simultaneously estimate parameters and select important variables at both the group level and the within-group level.For the situations in which the number of parameters tends to∞as the sample size increases,we establish an oracle property and asymptotic normality property of the proposed estimators.Numerical results indicate that the hierarchically penalized method performs better than some existing methods such as lasso,smoothly clipped absolute deviation(SCAD)and adaptive lasso.展开更多
In personalised medicine,the goal is tomake a treatment recommendation for each patient with a given set of covariates tomaximise the treatment benefitmeasured by patient’s response to the treatment.In application,su...In personalised medicine,the goal is tomake a treatment recommendation for each patient with a given set of covariates tomaximise the treatment benefitmeasured by patient’s response to the treatment.In application,such a treatment assignment rule is constructed using a sample training data consisting of patients’responses and covariates.Instead of modelling responses using treatments and covariates,an alternative approach is maximising a response-weighted target function whose value directly reflects the effectiveness of treatment assignments.Since the target function involves a loss function,efforts have been made recently on the choice of the loss function to ensure a computationally feasible and theoretically sound solution.We propose to use a smooth hinge loss function so that the target function is convex and differentiable,which possesses good asymptotic properties and numerical advantages.To further simplify the computation and interpretability,we focus on the rules that are linear functions of covariates and discuss their asymptotic properties.We also examine the performances of our method with simulation studies and real data analysis.展开更多
文摘We study the asymptotic properties of adaptive lasso estimators when some components of the parameter of interest βare strictly different than zero, while other components may be zero or may converge to zero with rate n<sup>-δ</sup>, with δ>0, where n denotes the sample size. To achieve this objective, we analyze the convergence/divergence rates of each term in the first-order conditions of adaptive lasso estimators. First, we derive conditions that allow selecting tuning parameters in order to ensure that adaptive lasso estimates of n<sup>-δ</sup>-components indeed collapse to zero. Second, in this case, we also derive asymptotic distributions of adaptive lasso estimators for nonzero components. When δ>1/2, we obtain the usual n<sup>1/2</sup>-asymptotic normal distribution, while when 0δ≤1/2, we show n<sup>δ</sup>-consistency combined with (biased) n<sup>1/2-δ</sup>-asymptotic normality for nonzero components. We call these properties, Extended Oracle Properties. These results allow practitioners to exclude in their model the asymptotically negligible variables and make inferences on the asymptotically relevant variables.
基金Supported by the National Natural Science Foundation of China(11501578,11501579,11701571,41572315)the Fundamental Research Funds for the Central Universities(CUGW150809)
文摘The seamless-L0 (SELO) penalty is a smooth function on [0, ∞) that very closely resembles the L0 penalty, which has been demonstrated theoretically and practically to be effective in nonconvex penalization for variable selection. In this paper, we first generalize SELO to a class of penalties retaining good features of SELO, and then propose variable selection and estimation in linear models using the proposed generalized SELO (GSELO) penalized least squares (PLS) approach. We show that the GSELO-PLS procedure possesses the oracle property and consistently selects the true model under some regularity conditions in the presence of a diverging number of variables. The entire path of GSELO-PLS estimates can be efficiently computed through a smoothing quasi-Newton (SQN) method. A modified BIC coupled with a continuation strategy is developed to select the optimal tuning parameter. Simulation studies and analysis of a clinical data are carried out to evaluate the finite sample performance of the proposed method. In addition, numerical experiments involving simulation studies and analysis of a microarray data are also conducted for GSELO-PLS in the high-dimensional settings.
基金the National Social Science Foundation of China under Grant No.18BTJ040。
文摘This paper discusses the asymptotic properties of the SCAD(smoothing clipped absolute deviation)penalized quasi-likelihood estimator for generalized linear models with adaptive designs,which extend the related results for independent observations to dependent observations.Under certain conditions,the authors proved that the SCAD penalized method correctly selects covariates with nonzero coefficients with probability converging to one,and the penalized quasi-likelihood estimators of non-zero coefficients have the same asymptotic distribution they would have if the zero coefficients were known in advance.That is,the SCAD estimator has consistency and oracle properties.At last,the results are illustrated by some simulations.
基金supported by National Natural Science Foundation of China(Grant No.11201306)the Innovation Program of Shanghai Municipal Education Commission(Grant No.13YZ065)+2 种基金the Fundamental Research Project of Shanghai Normal University(Grant No.SK201207)the scholarship under the State Scholarship Fund by the China Scholarship Council in 2011the Research Grant Council of Hong Kong, Hong Kong,China(Grant No.#HKBU2028/10P)
文摘For analyzing correlated binary data with high-dimensional covariates,we,in this paper,propose a two-stage shrinkage approach.First,we construct a weighted least-squares(WLS) type function using a special weighting scheme on the non-conservative vector field of the generalized estimating equations(GEE) model.Second,we define a penalized WLS in the spirit of the adaptive LASSO for simultaneous variable selection and parameter estimation.The proposed procedure enjoys the oracle properties in high-dimensional framework where the number of parameters grows to infinity with the number of clusters.Moreover,we prove the consistency of the sandwich formula of the covariance matrix even when the working correlation matrix is misspecified.For the selection of tuning parameter,we develop a consistent penalized quadratic form(PQF) function criterion.The performance of the proposed method is assessed through a comparison with the existing methods and through an application to a crossover trial in a pain relief study.
基金supported by National Natural Science Foundation of China(10901162)supported by the Fundamental Research Funds for the Central Universities and the Research Funds of Renmin University of China(10XNF073)supported by China Postdoctoral Science Foundation(2014M550799)
文摘In this paper, we investigate the variable selection problem of the generalized regression models. To estimate the regression parameter, a procedure combining the rank correlation method and the adaptive lasso technique is developed, which is proved to have oracle properties. A modified IMO (iterative marginal optimization) algorithm which directly aims to maximize the penalized rank correlation function is proposed. The effects of the estimating procedure are illustrated by simulation studies.
文摘The integer-valued generalized autoregressive conditional heteroskedastic(INGARCH)model is often utilized to describe data in biostatistics,such as the number of people infected with dengue fever,daily epileptic seizure counts of an epileptic patient and the number of cases of campylobacterosis infections,etc.Since the structure of such data is generally high-order and sparse,studies about order shrinkage and selection for the model attract many attentions.In this paper,we propose a penalized conditional maximum likelihood(PCML)method to solve this problem.The PCML method can effectively select significant orders and estimate the parameters,simultaneously.Some simulations and a real data analysis are carried out to illustrate the usefulness of our method.
文摘In this paper,the authors investigate three aspects of statistical inference for the partially linear regression models where some covariates are measured with errors.Firstly, a bandwidth selection procedure is proposed,which is a combination of the differencebased technique and GCV method.Secondly,a goodness-of-fit test procedure is proposed, which is an extension of the generalized likelihood technique.Thirdly,a variable selection procedure for the parametric part is provided based on the nonconcave penalization and corrected profile least squares.Same as"Variable selection via nonconcave penalized likelihood and its oracle properties"(J.Amer.Statist.Assoc.,96,2001,1348-1360),it is shown that the resulting estimator has an oracle property with a proper choice of regularization parameters and penalty function.Simulation studies are conducted to illustrate the finite sample performances of the proposed procedures.
基金supported by the National Natural Science Foundation of China under Grant No.61272041
文摘In this paper, based on spline approximation, the authors propose a unified variable selection approach for single-index model via adaptive L1 penalty. The calculation methods of the proposed estimators are given on the basis of the known lars algorithm. Under some regular conditions, the authors demonstrate the asymptotic properties of the proposed estimators and the oracle properties of adaptive LASSO(aL ASSO) variable selection. Simulations are used to investigate the performances of the proposed estimator and illustrate that it is effective for simultaneous variable selection as well as estimation of the single-index models.
基金supported partly by National Natural Science Foundation of China (Grant No. 11071045)Shanghai Leading Academic Discipline Project (Grant No. B210)
文摘This paper considers variable selection for moment restriction models. We propose a penalized empirical likelihood (PEL) approach that has desirable asymptotic properties comparable to the penalized likelihood approach, which relies on a correct parametric likelihood specification. In addition to being consistent and having the oracle property, PEL admits inference on parameter without having to estimate its estimator's covariance. An approximate algorithm, along with a consistent BIC-type criterion for selecting the tuning parameters, is provided for FEL. The proposed algorithm enjoys considerable computational efficiency and overcomes the drawback of the local quadratic approximation of nonconcave penalties. Simulation studies to evaluate and compare the performances of our method with those of the existing ones show that PEL is competitive and robust. The proposed method is illustrated with two real examples.
文摘We consider the problem of variable selection for single-index varying-coefficient model, and present a regularized variable selection procedure by combining basis function approximations with SCAD penalty. The proposed procedure simultaneously selects significant covariates with functional coefficients and local significant variables with parametric coefficients. With appropriate selection of the tuning parameters, the consistency of the variable selection procedure and the oracle property of the estimators are established. The proposed method can naturally be applied to deal with pure single-index model and varying-coefficient model. Finite sample performances of the proposed method are illustrated by a simulation study and the real data analysis.
基金supported by National Natural Science Foundation of China(GrantNos.71271128 and 11101442)the State Key Program of National Natural Science Foundation of China(GrantNo.71331006)+2 种基金National Center for Mathematics and Interdisciplinary Sciences(NCMIS)Shanghai Leading Academic Discipline Project A,in Ranking Top of Shanghai University of Finance and Economics(IRTSHUFE)Scientific Research Innovation Fund for PhD Studies(Grant No.CXJJ-2011-434)
文摘The varying-coefficient model is flexible and powerful for modeling the dynamic changes of regression coefficients. We study the problem of variable selection and estimation in this model in the sparse, high- dimensional case. We develop a concave group selection approach for this problem using basis function expansion and study its theoretical and empirical properties. We also apply the group Lasso for variable selection and estimation in this model and study its properties. Under appropriate conditions, we show that the group least absolute shrinkage and selection operator (Lasso) selects a model whose dimension is comparable to the underlying mode], regardless of the large number of unimportant variables. In order to improve the selection results, we show that the group minimax concave penalty (MCP) has the oracle selection property in the sense that it correctly selects important variables with probability converging to one under suitable conditions. By comparison, the group Lasso does not have the oracle selection property. In the simulation parts, we apply the group Lasso and the group MCP. At the same time, the two approaches are evaluated using simulation and demonstrated on a data example.
基金Supported by National Natural Science Foundation of China(Grant Nos.11501522,11101014,11001118 and11171012)National Statistical Research Projects(Grant No.2014LZ45)+2 种基金the Doctoral Fund of Innovation of Beijing University of Technologythe Science and Technology Project of the Faculty Adviser of Excellent PhD Degree Thesis of Beijing(Grant No.20111000503)the Beijing Municipal Education Commission Foundation(Grant No.KM201110005029)
文摘In this puper, we consider the problem of variabie selection and model detection in varying coefficient models with longitudinM data. We propose a combined penalization procedure to select the significant variables, detect the true structure of the model and estimate the unknown regression coefficients simultaneously. With appropriate selection of the tuning parameters, we show that the proposed procedure is consistent in both variable selection and the separation of varying and constant coefficients, and the penalized estimators have the oracle property. Finite sample performances of the proposed method are illustrated by some simulation studies and the real data analysis.
基金Supported by National Natural Science Foundation of China(Grant No.12071348)Fundamental Research Funds for Central Universities,China(Grant No.2023-3-2D-04)。
文摘In this paper,we focus on the partially linear varying-coefficient quantile regression with missing observations under ultra-high dimension,where the missing observations include either responses or covariates or the responses and part of the covariates are missing at random,and the ultra-high dimension implies that the dimension of parameter is much larger than sample size.Based on the B-spline method for the varying coefficient functions,we study the consistency of the oracle estimator which is obtained only using active covariates whose coefficients are nonzero.At the same time,we discuss the asymptotic normality of the oracle estimator for the linear parameter.Note that the active covariates are unknown in practice,non-convex penalized estimator is investigated for simultaneous variable selection and estimation,whose oracle property is also established.Finite sample behavior of the proposed methods is investigated via simulations and real data analysis.
基金supported by National Natural Science Foundation of China(Grant No.11571219)the Open Research Fund Program of Key Laboratory of Mathematical Economics(SUFE)(Grant No.201309KF02)Ministry of Education,and Changjiang Scholars and Innovative Research Team in University(Grant No.IRT13077)
文摘In practice, predictors possess grouping structures spontaneously. Incorporation of such useful information can improve statistical modeling and inference. In addition, the high-dimensionality often leads to the collinearity problem. The elastic net is an ideal method which is inclined to reflect a grouping effect. In this paper, we consider the problem of group selection and estimation in the sparse linear regression model in which predictors can be grouped. We investigate a group adaptive elastic-net and derive oracle inequalities and model consistency for the cases where group number is larger than the sample size. Oracle property is addressed for the case of the fixed group number. We revise the locally approximated coordinate descent algorithm to make our computation. Simulation and real data studies indicate that the group adaptive elastic-net is an alternative and competitive method for model selection of high-dimensional problems for the cases of group number being larger than the sample size.
基金supported by the National Natural Science Foundation of China under Grant Nos.11471029,11101014,61273221 and 11171010the Beijing Natural Science Foundation under Grant Nos.1142002 and 1112001+1 种基金the Science and Technology Project of Beijing Municipal Education Commission under Grant No.KM201410005010the Research Fund for the Doctoral Program of Beijing University of Technology under Grant No.006000543114550
文摘This paper employs the SCAD-penalized least squares method to simultaneously select variables and estimate the coefficients for high-dimensional covariate adjusted linear regression models.The distorted variables are assumed to be contaminated with a multiplicative factor that is determined by the value of an unknown function of an observable covariate.The authors show that under some appropriate conditions,the SCAD-penalized least squares estimator has the so called "oracle property".In addition,the authors also suggest a BIC criterion to select the tuning parameter,and show that BIC criterion is able to identify the true model consistently for the covariate adjusted linear regression models.Simulation studies and a real data are used to illustrate the efficiency of the proposed estimation algorithm.
基金supported by Natural Science Foundation of USA (Grant Nos. DMS1206464 and DMS1613338)National Institutes of Health of USA (Grant Nos. R01GM072611, R01GM100474 and R01GM120507)
文摘In statistics and machine learning communities, the last fifteen years have witnessed a surge of high-dimensional models backed by penalized methods and other state-of-the-art variable selection techniques.The high-dimensional models we refer to differ from conventional models in that the number of all parameters p and number of significant parameters s are both allowed to grow with the sample size T. When the field-specific knowledge is preliminary and in view of recent and potential affluence of data from genetics, finance and on-line social networks, etc., such(s, T, p)-triply diverging models enjoy ultimate flexibility in terms of modeling, and they can be used as a data-guided first step of investigation. However, model selection consistency and other theoretical properties were addressed only for independent data, leaving time series largely uncovered. On a simple linear regression model endowed with a weakly dependent sequence, this paper applies a penalized least squares(PLS) approach. Under regularity conditions, we show sign consistency, derive finite sample bound with high probability for estimation error, and prove that PLS estimate is consistent in L_2 norm with rate (s log s/T)~1/2.
基金Supported by the National Natural Science Foundation of China Grant(Grant No.12201091)Natural Science Foundation of Chongqing Grant(Grant Nos.CSTB2022NSCQ-MSX0852,cstc2021jcyj-msxmX0502)+3 种基金Innovation Support Program for Chongqing Overseas Returnees(Grant No.cx2020025)Science and Technology Research Program of Chongqing Municipal Education Commission(Grant Nos.KJQN202100526,KJQN201900511)the National Statistical Science Research Program(Grant No.2022LY019)Chongqing University Innovation Research Group Project:Nonlinear Optimization Method and Its Application(Grant No.CXQT20014)。
文摘In this paper,we develop a flexible semiparametric model averaging marginal regression procedure to forecast the joint conditional quantile function of the response variable for ultrahighdimensional data.First,we approximate the joint conditional quantile function by a weighted average of one-dimensional marginal conditional quantile functions that have varying coefficient structures.Then,a local linear regression technique is employed to derive the consistent estimates of marginal conditional quantile functions.Second,based on estimated marginal conditional quantile functions,we estimate and select the significant model weights involved in the approximation by a nonconvex penalized quantile regression.Under some relaxed conditions,we establish the asymptotic properties for the nonparametric kernel estimators and oracle estimators of the model averaging weights.We further derive the oracle property for the proposed nonconvex penalized model averaging procedure.Finally,simulation studies and a real data analysis are conducted to illustrate the merits of our proposed model averaging method.
基金supported by the National Natural Science Foundation of China under Grant Nos.12271294,12171225 and 12071248.
文摘When there are outliers or heavy-tailed distributions in the data, the traditional least squares with penalty function is no longer applicable. In addition, with the rapid development of science and technology, a lot of data, enjoying high dimension, strong correlation and redundancy, has been generated in real life. So it is necessary to find an effective variable selection method for dealing with collinearity based on the robust method. This paper proposes a penalized M-estimation method based on standard error adjusted adaptive elastic-net, which uses M-estimators and the corresponding standard errors as weights. The consistency and asymptotic normality of this method are proved theoretically. For the regularization in high-dimensional space, the authors use the multi-step adaptive elastic-net to reduce the dimension to a relatively large scale which is less than the sample size, and then use the proposed method to select variables and estimate parameters. Finally, the authors carry out simulation studies and two real data analysis to examine the finite sample performance of the proposed method. The results show that the proposed method has some advantages over other commonly used methods.
基金supported by National Natural Science Foundation of China(Grant Nos.11171112,11101114 and 11201190)National Statistical Science Research Major Program of China(Grant No.2011LZ051)
文摘In many applications,covariates can be naturally grouped.For example,for gene expression data analysis,genes belonging to the same pathway might be viewed as a group.This paper studies variable selection problem for censored survival data in the additive hazards model when covariates are grouped.A hierarchical regularization method is proposed to simultaneously estimate parameters and select important variables at both the group level and the within-group level.For the situations in which the number of parameters tends to∞as the sample size increases,we establish an oracle property and asymptotic normality property of the proposed estimators.Numerical results indicate that the hierarchically penalized method performs better than some existing methods such as lasso,smoothly clipped absolute deviation(SCAD)and adaptive lasso.
基金Research reported in this article was partially funded through a Patient-Centered Outcomes Research Institute(PCORI)Award[ME-1409-21219]The second author’s research was also partially supported by the Chinese 111 Project[B14019]the US National Science Foundation[grant number DMS-1612873].
文摘In personalised medicine,the goal is tomake a treatment recommendation for each patient with a given set of covariates tomaximise the treatment benefitmeasured by patient’s response to the treatment.In application,such a treatment assignment rule is constructed using a sample training data consisting of patients’responses and covariates.Instead of modelling responses using treatments and covariates,an alternative approach is maximising a response-weighted target function whose value directly reflects the effectiveness of treatment assignments.Since the target function involves a loss function,efforts have been made recently on the choice of the loss function to ensure a computationally feasible and theoretically sound solution.We propose to use a smooth hinge loss function so that the target function is convex and differentiable,which possesses good asymptotic properties and numerical advantages.To further simplify the computation and interpretability,we focus on the rules that are linear functions of covariates and discuss their asymptotic properties.We also examine the performances of our method with simulation studies and real data analysis.