We study the asymptotic properties of adaptive lasso estimators when some components of the parameter of interest βare strictly different than zero, while other components may be zero or may converge to zero ...We study the asymptotic properties of adaptive lasso estimators when some components of the parameter of interest βare strictly different than zero, while other components may be zero or may converge to zero with rate n<sup>-δ</sup>, with δ>0, where n denotes the sample size. To achieve this objective, we analyze the convergence/divergence rates of each term in the first-order conditions of adaptive lasso estimators. First, we derive conditions that allow selecting tuning parameters in order to ensure that adaptive lasso estimates of n<sup>-δ</sup>-components indeed collapse to zero. Second, in this case, we also derive asymptotic distributions of adaptive lasso estimators for nonzero components. When δ>1/2, we obtain the usual n<sup>1/2</sup>-asymptotic normal distribution, while when 0δ≤1/2, we show n<sup>δ</sup>-consistency combined with (biased) n<sup>1/2-δ</sup>-asymptotic normality for nonzero components. We call these properties, Extended Oracle Properties. These results allow practitioners to exclude in their model the asymptotically negligible variables and make inferences on the asymptotically relevant variables.展开更多
The seamless-L0 (SELO) penalty is a smooth function on [0, ∞) that very closely resembles the L0 penalty, which has been demonstrated theoretically and practically to be effective in nonconvex penalization for var...The seamless-L0 (SELO) penalty is a smooth function on [0, ∞) that very closely resembles the L0 penalty, which has been demonstrated theoretically and practically to be effective in nonconvex penalization for variable selection. In this paper, we first generalize SELO to a class of penalties retaining good features of SELO, and then propose variable selection and estimation in linear models using the proposed generalized SELO (GSELO) penalized least squares (PLS) approach. We show that the GSELO-PLS procedure possesses the oracle property and consistently selects the true model under some regularity conditions in the presence of a diverging number of variables. The entire path of GSELO-PLS estimates can be efficiently computed through a smoothing quasi-Newton (SQN) method. A modified BIC coupled with a continuation strategy is developed to select the optimal tuning parameter. Simulation studies and analysis of a clinical data are carried out to evaluate the finite sample performance of the proposed method. In addition, numerical experiments involving simulation studies and analysis of a microarray data are also conducted for GSELO-PLS in the high-dimensional settings.展开更多
In this paper,we develop a flexible semiparametric model averaging marginal regression procedure to forecast the joint conditional quantile function of the response variable for ultrahighdimensional data.First,we appr...In this paper,we develop a flexible semiparametric model averaging marginal regression procedure to forecast the joint conditional quantile function of the response variable for ultrahighdimensional data.First,we approximate the joint conditional quantile function by a weighted average of one-dimensional marginal conditional quantile functions that have varying coefficient structures.Then,a local linear regression technique is employed to derive the consistent estimates of marginal conditional quantile functions.Second,based on estimated marginal conditional quantile functions,we estimate and select the significant model weights involved in the approximation by a nonconvex penalized quantile regression.Under some relaxed conditions,we establish the asymptotic properties for the nonparametric kernel estimators and oracle estimators of the model averaging weights.We further derive the oracle property for the proposed nonconvex penalized model averaging procedure.Finally,simulation studies and a real data analysis are conducted to illustrate the merits of our proposed model averaging method.展开更多
In this paper,we focus on the partially linear varying-coefficient quantile regression with missing observations under ultra-high dimension,where the missing observations include either responses or covariates or the ...In this paper,we focus on the partially linear varying-coefficient quantile regression with missing observations under ultra-high dimension,where the missing observations include either responses or covariates or the responses and part of the covariates are missing at random,and the ultra-high dimension implies that the dimension of parameter is much larger than sample size.Based on the B-spline method for the varying coefficient functions,we study the consistency of the oracle estimator which is obtained only using active covariates whose coefficients are nonzero.At the same time,we discuss the asymptotic normality of the oracle estimator for the linear parameter.Note that the active covariates are unknown in practice,non-convex penalized estimator is investigated for simultaneous variable selection and estimation,whose oracle property is also established.Finite sample behavior of the proposed methods is investigated via simulations and real data analysis.展开更多
When there are outliers or heavy-tailed distributions in the data, the traditional least squares with penalty function is no longer applicable. In addition, with the rapid development of science and technology, a lot ...When there are outliers or heavy-tailed distributions in the data, the traditional least squares with penalty function is no longer applicable. In addition, with the rapid development of science and technology, a lot of data, enjoying high dimension, strong correlation and redundancy, has been generated in real life. So it is necessary to find an effective variable selection method for dealing with collinearity based on the robust method. This paper proposes a penalized M-estimation method based on standard error adjusted adaptive elastic-net, which uses M-estimators and the corresponding standard errors as weights. The consistency and asymptotic normality of this method are proved theoretically. For the regularization in high-dimensional space, the authors use the multi-step adaptive elastic-net to reduce the dimension to a relatively large scale which is less than the sample size, and then use the proposed method to select variables and estimate parameters. Finally, the authors carry out simulation studies and two real data analysis to examine the finite sample performance of the proposed method. The results show that the proposed method has some advantages over other commonly used methods.展开更多
We consider the problem of variable selection for single-index varying-coefficient model, and present a regularized variable selection procedure by combining basis function approximations with SCAD penalty. The propos...We consider the problem of variable selection for single-index varying-coefficient model, and present a regularized variable selection procedure by combining basis function approximations with SCAD penalty. The proposed procedure simultaneously selects significant covariates with functional coefficients and local significant variables with parametric coefficients. With appropriate selection of the tuning parameters, the consistency of the variable selection procedure and the oracle property of the estimators are established. The proposed method can naturally be applied to deal with pure single-index model and varying-coefficient model. Finite sample performances of the proposed method are illustrated by a simulation study and the real data analysis.展开更多
In this puper, we consider the problem of variabie selection and model detection in varying coefficient models with longitudinM data. We propose a combined penalization procedure to select the significant variables, d...In this puper, we consider the problem of variabie selection and model detection in varying coefficient models with longitudinM data. We propose a combined penalization procedure to select the significant variables, detect the true structure of the model and estimate the unknown regression coefficients simultaneously. With appropriate selection of the tuning parameters, we show that the proposed procedure is consistent in both variable selection and the separation of varying and constant coefficients, and the penalized estimators have the oracle property. Finite sample performances of the proposed method are illustrated by some simulation studies and the real data analysis.展开更多
In practice, predictors possess grouping structures spontaneously. Incorporation of such useful information can improve statistical modeling and inference. In addition, the high-dimensionality often leads to the colli...In practice, predictors possess grouping structures spontaneously. Incorporation of such useful information can improve statistical modeling and inference. In addition, the high-dimensionality often leads to the collinearity problem. The elastic net is an ideal method which is inclined to reflect a grouping effect. In this paper, we consider the problem of group selection and estimation in the sparse linear regression model in which predictors can be grouped. We investigate a group adaptive elastic-net and derive oracle inequalities and model consistency for the cases where group number is larger than the sample size. Oracle property is addressed for the case of the fixed group number. We revise the locally approximated coordinate descent algorithm to make our computation. Simulation and real data studies indicate that the group adaptive elastic-net is an alternative and competitive method for model selection of high-dimensional problems for the cases of group number being larger than the sample size.展开更多
In personalised medicine,the goal is tomake a treatment recommendation for each patient with a given set of covariates tomaximise the treatment benefitmeasured by patient’s response to the treatment.In application,su...In personalised medicine,the goal is tomake a treatment recommendation for each patient with a given set of covariates tomaximise the treatment benefitmeasured by patient’s response to the treatment.In application,such a treatment assignment rule is constructed using a sample training data consisting of patients’responses and covariates.Instead of modelling responses using treatments and covariates,an alternative approach is maximising a response-weighted target function whose value directly reflects the effectiveness of treatment assignments.Since the target function involves a loss function,efforts have been made recently on the choice of the loss function to ensure a computationally feasible and theoretically sound solution.We propose to use a smooth hinge loss function so that the target function is convex and differentiable,which possesses good asymptotic properties and numerical advantages.To further simplify the computation and interpretability,we focus on the rules that are linear functions of covariates and discuss their asymptotic properties.We also examine the performances of our method with simulation studies and real data analysis.展开更多
This paper is concerned with the statistical inference of partially linear varying coefficient dynamic panel data model with incidental parameter, including efficient estimation of the parametric and nonparametric com...This paper is concerned with the statistical inference of partially linear varying coefficient dynamic panel data model with incidental parameter, including efficient estimation of the parametric and nonparametric components and consistent determination of the lagged order. For the parametric component, we propose an efficient semiparametric generalized method-of-moments(GMM) estimator and establish its asymptotic normality. For the nonparametric component, B-spline series approximation is employed to estimate the unknown coefficient functions, which are shown to achieve the optimal nonparametric convergence rate. A consistent estimator of the variance of error component is also constructed. In addition, by using the smooth-threshold GMM estimating equations, we propose a variable selection method to identify the significant order of lagged terms automatically and remove the irrelevant regressors by setting their coefficient to zeros. As a result, it can consistently determine the true lagged order and specify the significant exogenous variables. Further studies show that the resulting estimator has the same asymptotic properties as if the true lagged order and significant regressors were known prior, i.e., achieving the oracle property. Numerical experiments are conducted to evaluate the finite sample performance of our procedures. An example of application is also illustrated.展开更多
This paper discusses the asymptotic properties of the SCAD(smoothing clipped absolute deviation)penalized quasi-likelihood estimator for generalized linear models with adaptive designs,which extend the related results...This paper discusses the asymptotic properties of the SCAD(smoothing clipped absolute deviation)penalized quasi-likelihood estimator for generalized linear models with adaptive designs,which extend the related results for independent observations to dependent observations.Under certain conditions,the authors proved that the SCAD penalized method correctly selects covariates with nonzero coefficients with probability converging to one,and the penalized quasi-likelihood estimators of non-zero coefficients have the same asymptotic distribution they would have if the zero coefficients were known in advance.That is,the SCAD estimator has consistency and oracle properties.At last,the results are illustrated by some simulations.展开更多
In this paper, we investigate the variable selection problem of the generalized regression models. To estimate the regression parameter, a procedure combining the rank correlation method and the adaptive lasso techniq...In this paper, we investigate the variable selection problem of the generalized regression models. To estimate the regression parameter, a procedure combining the rank correlation method and the adaptive lasso technique is developed, which is proved to have oracle properties. A modified IMO (iterative marginal optimization) algorithm which directly aims to maximize the penalized rank correlation function is proposed. The effects of the estimating procedure are illustrated by simulation studies.展开更多
The integer-valued generalized autoregressive conditional heteroskedastic(INGARCH)model is often utilized to describe data in biostatistics,such as the number of people infected with dengue fever,daily epileptic seizu...The integer-valued generalized autoregressive conditional heteroskedastic(INGARCH)model is often utilized to describe data in biostatistics,such as the number of people infected with dengue fever,daily epileptic seizure counts of an epileptic patient and the number of cases of campylobacterosis infections,etc.Since the structure of such data is generally high-order and sparse,studies about order shrinkage and selection for the model attract many attentions.In this paper,we propose a penalized conditional maximum likelihood(PCML)method to solve this problem.The PCML method can effectively select significant orders and estimate the parameters,simultaneously.Some simulations and a real data analysis are carried out to illustrate the usefulness of our method.展开更多
文摘We study the asymptotic properties of adaptive lasso estimators when some components of the parameter of interest βare strictly different than zero, while other components may be zero or may converge to zero with rate n<sup>-δ</sup>, with δ>0, where n denotes the sample size. To achieve this objective, we analyze the convergence/divergence rates of each term in the first-order conditions of adaptive lasso estimators. First, we derive conditions that allow selecting tuning parameters in order to ensure that adaptive lasso estimates of n<sup>-δ</sup>-components indeed collapse to zero. Second, in this case, we also derive asymptotic distributions of adaptive lasso estimators for nonzero components. When δ>1/2, we obtain the usual n<sup>1/2</sup>-asymptotic normal distribution, while when 0δ≤1/2, we show n<sup>δ</sup>-consistency combined with (biased) n<sup>1/2-δ</sup>-asymptotic normality for nonzero components. We call these properties, Extended Oracle Properties. These results allow practitioners to exclude in their model the asymptotically negligible variables and make inferences on the asymptotically relevant variables.
基金Supported by the National Natural Science Foundation of China(11501578,11501579,11701571,41572315)the Fundamental Research Funds for the Central Universities(CUGW150809)
文摘The seamless-L0 (SELO) penalty is a smooth function on [0, ∞) that very closely resembles the L0 penalty, which has been demonstrated theoretically and practically to be effective in nonconvex penalization for variable selection. In this paper, we first generalize SELO to a class of penalties retaining good features of SELO, and then propose variable selection and estimation in linear models using the proposed generalized SELO (GSELO) penalized least squares (PLS) approach. We show that the GSELO-PLS procedure possesses the oracle property and consistently selects the true model under some regularity conditions in the presence of a diverging number of variables. The entire path of GSELO-PLS estimates can be efficiently computed through a smoothing quasi-Newton (SQN) method. A modified BIC coupled with a continuation strategy is developed to select the optimal tuning parameter. Simulation studies and analysis of a clinical data are carried out to evaluate the finite sample performance of the proposed method. In addition, numerical experiments involving simulation studies and analysis of a microarray data are also conducted for GSELO-PLS in the high-dimensional settings.
基金Supported by the National Natural Science Foundation of China Grant(Grant No.12201091)Natural Science Foundation of Chongqing Grant(Grant Nos.CSTB2022NSCQ-MSX0852,cstc2021jcyj-msxmX0502)+3 种基金Innovation Support Program for Chongqing Overseas Returnees(Grant No.cx2020025)Science and Technology Research Program of Chongqing Municipal Education Commission(Grant Nos.KJQN202100526,KJQN201900511)the National Statistical Science Research Program(Grant No.2022LY019)Chongqing University Innovation Research Group Project:Nonlinear Optimization Method and Its Application(Grant No.CXQT20014)。
文摘In this paper,we develop a flexible semiparametric model averaging marginal regression procedure to forecast the joint conditional quantile function of the response variable for ultrahighdimensional data.First,we approximate the joint conditional quantile function by a weighted average of one-dimensional marginal conditional quantile functions that have varying coefficient structures.Then,a local linear regression technique is employed to derive the consistent estimates of marginal conditional quantile functions.Second,based on estimated marginal conditional quantile functions,we estimate and select the significant model weights involved in the approximation by a nonconvex penalized quantile regression.Under some relaxed conditions,we establish the asymptotic properties for the nonparametric kernel estimators and oracle estimators of the model averaging weights.We further derive the oracle property for the proposed nonconvex penalized model averaging procedure.Finally,simulation studies and a real data analysis are conducted to illustrate the merits of our proposed model averaging method.
基金Supported by National Natural Science Foundation of China(Grant No.12071348)Fundamental Research Funds for Central Universities,China(Grant No.2023-3-2D-04)。
文摘In this paper,we focus on the partially linear varying-coefficient quantile regression with missing observations under ultra-high dimension,where the missing observations include either responses or covariates or the responses and part of the covariates are missing at random,and the ultra-high dimension implies that the dimension of parameter is much larger than sample size.Based on the B-spline method for the varying coefficient functions,we study the consistency of the oracle estimator which is obtained only using active covariates whose coefficients are nonzero.At the same time,we discuss the asymptotic normality of the oracle estimator for the linear parameter.Note that the active covariates are unknown in practice,non-convex penalized estimator is investigated for simultaneous variable selection and estimation,whose oracle property is also established.Finite sample behavior of the proposed methods is investigated via simulations and real data analysis.
基金supported by the National Natural Science Foundation of China under Grant Nos.12271294,12171225 and 12071248.
文摘When there are outliers or heavy-tailed distributions in the data, the traditional least squares with penalty function is no longer applicable. In addition, with the rapid development of science and technology, a lot of data, enjoying high dimension, strong correlation and redundancy, has been generated in real life. So it is necessary to find an effective variable selection method for dealing with collinearity based on the robust method. This paper proposes a penalized M-estimation method based on standard error adjusted adaptive elastic-net, which uses M-estimators and the corresponding standard errors as weights. The consistency and asymptotic normality of this method are proved theoretically. For the regularization in high-dimensional space, the authors use the multi-step adaptive elastic-net to reduce the dimension to a relatively large scale which is less than the sample size, and then use the proposed method to select variables and estimate parameters. Finally, the authors carry out simulation studies and two real data analysis to examine the finite sample performance of the proposed method. The results show that the proposed method has some advantages over other commonly used methods.
文摘We consider the problem of variable selection for single-index varying-coefficient model, and present a regularized variable selection procedure by combining basis function approximations with SCAD penalty. The proposed procedure simultaneously selects significant covariates with functional coefficients and local significant variables with parametric coefficients. With appropriate selection of the tuning parameters, the consistency of the variable selection procedure and the oracle property of the estimators are established. The proposed method can naturally be applied to deal with pure single-index model and varying-coefficient model. Finite sample performances of the proposed method are illustrated by a simulation study and the real data analysis.
基金Supported by National Natural Science Foundation of China(Grant Nos.11501522,11101014,11001118 and11171012)National Statistical Research Projects(Grant No.2014LZ45)+2 种基金the Doctoral Fund of Innovation of Beijing University of Technologythe Science and Technology Project of the Faculty Adviser of Excellent PhD Degree Thesis of Beijing(Grant No.20111000503)the Beijing Municipal Education Commission Foundation(Grant No.KM201110005029)
文摘In this puper, we consider the problem of variabie selection and model detection in varying coefficient models with longitudinM data. We propose a combined penalization procedure to select the significant variables, detect the true structure of the model and estimate the unknown regression coefficients simultaneously. With appropriate selection of the tuning parameters, we show that the proposed procedure is consistent in both variable selection and the separation of varying and constant coefficients, and the penalized estimators have the oracle property. Finite sample performances of the proposed method are illustrated by some simulation studies and the real data analysis.
基金supported by National Natural Science Foundation of China(Grant No.11571219)the Open Research Fund Program of Key Laboratory of Mathematical Economics(SUFE)(Grant No.201309KF02)Ministry of Education,and Changjiang Scholars and Innovative Research Team in University(Grant No.IRT13077)
文摘In practice, predictors possess grouping structures spontaneously. Incorporation of such useful information can improve statistical modeling and inference. In addition, the high-dimensionality often leads to the collinearity problem. The elastic net is an ideal method which is inclined to reflect a grouping effect. In this paper, we consider the problem of group selection and estimation in the sparse linear regression model in which predictors can be grouped. We investigate a group adaptive elastic-net and derive oracle inequalities and model consistency for the cases where group number is larger than the sample size. Oracle property is addressed for the case of the fixed group number. We revise the locally approximated coordinate descent algorithm to make our computation. Simulation and real data studies indicate that the group adaptive elastic-net is an alternative and competitive method for model selection of high-dimensional problems for the cases of group number being larger than the sample size.
基金Research reported in this article was partially funded through a Patient-Centered Outcomes Research Institute(PCORI)Award[ME-1409-21219]The second author’s research was also partially supported by the Chinese 111 Project[B14019]the US National Science Foundation[grant number DMS-1612873].
文摘In personalised medicine,the goal is tomake a treatment recommendation for each patient with a given set of covariates tomaximise the treatment benefitmeasured by patient’s response to the treatment.In application,such a treatment assignment rule is constructed using a sample training data consisting of patients’responses and covariates.Instead of modelling responses using treatments and covariates,an alternative approach is maximising a response-weighted target function whose value directly reflects the effectiveness of treatment assignments.Since the target function involves a loss function,efforts have been made recently on the choice of the loss function to ensure a computationally feasible and theoretically sound solution.We propose to use a smooth hinge loss function so that the target function is convex and differentiable,which possesses good asymptotic properties and numerical advantages.To further simplify the computation and interpretability,we focus on the rules that are linear functions of covariates and discuss their asymptotic properties.We also examine the performances of our method with simulation studies and real data analysis.
基金supported by SHUFE Graduate Innovation and Creativity Funds(No.2011130151)supported by grants from the National Natural Science Foundation of China(NSFC)(No.11071154)+1 种基金partially supported by the Leading Academic Discipline Program211 Project for Shanghai University of Finance and Economics
文摘This paper is concerned with the statistical inference of partially linear varying coefficient dynamic panel data model with incidental parameter, including efficient estimation of the parametric and nonparametric components and consistent determination of the lagged order. For the parametric component, we propose an efficient semiparametric generalized method-of-moments(GMM) estimator and establish its asymptotic normality. For the nonparametric component, B-spline series approximation is employed to estimate the unknown coefficient functions, which are shown to achieve the optimal nonparametric convergence rate. A consistent estimator of the variance of error component is also constructed. In addition, by using the smooth-threshold GMM estimating equations, we propose a variable selection method to identify the significant order of lagged terms automatically and remove the irrelevant regressors by setting their coefficient to zeros. As a result, it can consistently determine the true lagged order and specify the significant exogenous variables. Further studies show that the resulting estimator has the same asymptotic properties as if the true lagged order and significant regressors were known prior, i.e., achieving the oracle property. Numerical experiments are conducted to evaluate the finite sample performance of our procedures. An example of application is also illustrated.
基金the National Social Science Foundation of China under Grant No.18BTJ040。
文摘This paper discusses the asymptotic properties of the SCAD(smoothing clipped absolute deviation)penalized quasi-likelihood estimator for generalized linear models with adaptive designs,which extend the related results for independent observations to dependent observations.Under certain conditions,the authors proved that the SCAD penalized method correctly selects covariates with nonzero coefficients with probability converging to one,and the penalized quasi-likelihood estimators of non-zero coefficients have the same asymptotic distribution they would have if the zero coefficients were known in advance.That is,the SCAD estimator has consistency and oracle properties.At last,the results are illustrated by some simulations.
基金supported by National Natural Science Foundation of China(10901162)supported by the Fundamental Research Funds for the Central Universities and the Research Funds of Renmin University of China(10XNF073)supported by China Postdoctoral Science Foundation(2014M550799)
文摘In this paper, we investigate the variable selection problem of the generalized regression models. To estimate the regression parameter, a procedure combining the rank correlation method and the adaptive lasso technique is developed, which is proved to have oracle properties. A modified IMO (iterative marginal optimization) algorithm which directly aims to maximize the penalized rank correlation function is proposed. The effects of the estimating procedure are illustrated by simulation studies.
文摘The integer-valued generalized autoregressive conditional heteroskedastic(INGARCH)model is often utilized to describe data in biostatistics,such as the number of people infected with dengue fever,daily epileptic seizure counts of an epileptic patient and the number of cases of campylobacterosis infections,etc.Since the structure of such data is generally high-order and sparse,studies about order shrinkage and selection for the model attract many attentions.In this paper,we propose a penalized conditional maximum likelihood(PCML)method to solve this problem.The PCML method can effectively select significant orders and estimate the parameters,simultaneously.Some simulations and a real data analysis are carried out to illustrate the usefulness of our method.