As extensions of means, expectiles embrace all the distribution information of a random variable.The expectile regression is computationally friendlier because the asymmetric least square loss function is differentiab...As extensions of means, expectiles embrace all the distribution information of a random variable.The expectile regression is computationally friendlier because the asymmetric least square loss function is differentiable everywhere. This regression also enables effective estimation of the expectiles of a response variable when potential explanatory variables are given. In this study, we propose the partial functional linear expectile regression model. The slope function and constant coefficients are estimated by using the functional principal component basis. The convergence rate of the slope function and the asymptotic normality of the parameter vector are established. To inspect the effect of the parametric component on the response variable, we develop Wald-type and expectile rank score tests and establish their asymptotic properties. The finite performance of the proposed estimators and test statistics are evaluated through simulation study. Results indicate that the proposed estimators are comparable to competing estimation methods and the newly proposed expectile rank score test is useful. The methodologies are illustrated by using two real data examples.展开更多
In this paper,we study the large-scale inference for a linear expectile regression model.To mitigate the computational challenges in the classical asymmetric least squares(ALS)estimation under massive data,we propose ...In this paper,we study the large-scale inference for a linear expectile regression model.To mitigate the computational challenges in the classical asymmetric least squares(ALS)estimation under massive data,we propose a communication-efficient divide and conquer algorithm to combine the information from sub-machines through confidence distributions.The resulting pooled estimator has a closed-form expression,and its consistency and asymptotic normality are established under mild conditions.Moreover,we derive the Bahadur representation of the ALS estimator,which serves as an important tool to study the relationship between the number of submachines K and the sample size.Numerical studies including both synthetic and real data examples are presented to illustrate the finite-sample performance of our method and support the theoretical results.展开更多
The problem of estimating high-dimensional Gaussian graphical models has gained much attention in recent years. Most existing methods can be considered as one-step approaches, being either regression-based or likeliho...The problem of estimating high-dimensional Gaussian graphical models has gained much attention in recent years. Most existing methods can be considered as one-step approaches, being either regression-based or likelihood-based. In this paper, we propose a two-step method for estimating the high-dimensional Gaussian graphical model. Specifically, the first step serves as a screening step, in which many entries of the concentration matrix are identified as zeros and thus removed from further consideration. Then in the second step, we focus on the remaining entries of the concentration matrix and perform selection and estimation for nonzero entries of the concentration matrix. Since the dimension of the parameter space is effectively reduced by the screening step,the estimation accuracy of the estimated concentration matrix can be potentially improved. We show that the proposed method enjoys desirable asymptotic properties. Numerical comparisons of the proposed method with several existing methods indicate that the proposed method works well. We also apply the proposed method to a breast cancer microarray data set and obtain some biologically meaningful results.展开更多
In this paper, we analyze ovarian cancer cases from six hospitals in China, screen the prognostic factors and predict the survival rate. The data has the feature that all the covariates are categorical. We use three m...In this paper, we analyze ovarian cancer cases from six hospitals in China, screen the prognostic factors and predict the survival rate. The data has the feature that all the covariates are categorical. We use three methods to estimate the survival rate–the traditional Cox regression, the two-step Cox regression and a method based on conditional inference tree. By comparison, we know that they are all effective and can predict the survival curve reasonably. The analysis results show that the survival rate is determined by a combination of risk factors, where clinical stage is the most important prognosis factor.展开更多
基金supported by National Natural Science Foundation of China(Grant No.11771032)Natural Science Foundation of Shanxi Province of China(Grant No.201901D111279)+1 种基金the Research Grant Council of the Hong Kong Special Administration Region(Grant Nos.14301918 and 14302519)。
文摘As extensions of means, expectiles embrace all the distribution information of a random variable.The expectile regression is computationally friendlier because the asymmetric least square loss function is differentiable everywhere. This regression also enables effective estimation of the expectiles of a response variable when potential explanatory variables are given. In this study, we propose the partial functional linear expectile regression model. The slope function and constant coefficients are estimated by using the functional principal component basis. The convergence rate of the slope function and the asymptotic normality of the parameter vector are established. To inspect the effect of the parametric component on the response variable, we develop Wald-type and expectile rank score tests and establish their asymptotic properties. The finite performance of the proposed estimators and test statistics are evaluated through simulation study. Results indicate that the proposed estimators are comparable to competing estimation methods and the newly proposed expectile rank score test is useful. The methodologies are illustrated by using two real data examples.
文摘In this paper,we study the large-scale inference for a linear expectile regression model.To mitigate the computational challenges in the classical asymmetric least squares(ALS)estimation under massive data,we propose a communication-efficient divide and conquer algorithm to combine the information from sub-machines through confidence distributions.The resulting pooled estimator has a closed-form expression,and its consistency and asymptotic normality are established under mild conditions.Moreover,we derive the Bahadur representation of the ALS estimator,which serves as an important tool to study the relationship between the number of submachines K and the sample size.Numerical studies including both synthetic and real data examples are presented to illustrate the finite-sample performance of our method and support the theoretical results.
基金National Natural Science Foundation of China (Grant No. 11671059)。
文摘The problem of estimating high-dimensional Gaussian graphical models has gained much attention in recent years. Most existing methods can be considered as one-step approaches, being either regression-based or likelihood-based. In this paper, we propose a two-step method for estimating the high-dimensional Gaussian graphical model. Specifically, the first step serves as a screening step, in which many entries of the concentration matrix are identified as zeros and thus removed from further consideration. Then in the second step, we focus on the remaining entries of the concentration matrix and perform selection and estimation for nonzero entries of the concentration matrix. Since the dimension of the parameter space is effectively reduced by the screening step,the estimation accuracy of the estimated concentration matrix can be potentially improved. We show that the proposed method enjoys desirable asymptotic properties. Numerical comparisons of the proposed method with several existing methods indicate that the proposed method works well. We also apply the proposed method to a breast cancer microarray data set and obtain some biologically meaningful results.
基金Supported by the National Natural Science Foundation of China(No.11171007/A011103)the Scientific Research Level Improvement Quota Project of Capital University of Economics and Business
文摘In this paper, we analyze ovarian cancer cases from six hospitals in China, screen the prognostic factors and predict the survival rate. The data has the feature that all the covariates are categorical. We use three methods to estimate the survival rate–the traditional Cox regression, the two-step Cox regression and a method based on conditional inference tree. By comparison, we know that they are all effective and can predict the survival curve reasonably. The analysis results show that the survival rate is determined by a combination of risk factors, where clinical stage is the most important prognosis factor.