In this paper, a model averaging method is proposed for varying-coefficient models with response missing at random by establishing a weight selection criterion based on cross-validation. Under certain regularity condi...In this paper, a model averaging method is proposed for varying-coefficient models with response missing at random by establishing a weight selection criterion based on cross-validation. Under certain regularity conditions, it is proved that the proposed method is asymptotically optimal in the sense of achieving the minimum squared error.展开更多
It is known that conditional independence is a quite basic assumption in many fields of statistics. How to test its validity is of great importance and has been extensively studied by the literature. Nevertheless, all...It is known that conditional independence is a quite basic assumption in many fields of statistics. How to test its validity is of great importance and has been extensively studied by the literature. Nevertheless, all of the existing methods focus on the case that data are fully observed, but none of them seems having taken into account of the scenario when missing data are present. Motivated by this, this paper develops two testing statistics to handle such a situation relying on the idea of inverse probability weighted and augmented inverse probability weighted techniques. The asymptotic distributions of the proposed statistics are also derived under the null hypothesis. The simulation studies indicate that both testing statistics perform well in terms of size and power.展开更多
Empirical-likelihood-based inference for parameters defined by the general estimating equations of Qin and Lawless(1994) remains an active research topic. When the response is missing at random(MAR) and the dimension ...Empirical-likelihood-based inference for parameters defined by the general estimating equations of Qin and Lawless(1994) remains an active research topic. When the response is missing at random(MAR) and the dimension of covariate is not low, the authors propose a two-stage estimation procedure by using the dimension-reduced kernel estimators in conjunction with an unbiased estimating function based on augmented inverse probability weighting and multiple imputation(AIPW-MI) methods. The authors show that the resulting estimator achieves consistency and asymptotic normality. In addition, the corresponding empirical likelihood ratio statistics asymptotically follow central chi-square distributions when evaluated at the true parameter. The finite-sample performance of the proposed estimator is studied through simulation, and an application to HIV-CD4 data set is also presented.展开更多
In this paper,we consider the weighted local polynomial calibration estimation and imputation estimation of a non-parametric function when the data are right censored and the censoring indicators are missing at random...In this paper,we consider the weighted local polynomial calibration estimation and imputation estimation of a non-parametric function when the data are right censored and the censoring indicators are missing at random,and establish the asymptotic normality of these estimators.As their applications,we derive the weighted local linear calibration estimators and imputation estimations of the conditional distribution function,the conditional density function and the conditional quantile function,and investigate the asymptotic normality of these estimators.Finally,the simulation studies are conducted to illustrate the finite sample performance of the estimators.展开更多
Missing covariate data arise frequently in biomedical studies.In this article,we propose a class of weighted estimating equations for the additive hazards regression model when some of the covariates are missing at ra...Missing covariate data arise frequently in biomedical studies.In this article,we propose a class of weighted estimating equations for the additive hazards regression model when some of the covariates are missing at random.Time-specific and subject-specific weights are incorporated into the formulation of weighted estimating equations.Unified results are established for estimating selection probabilities that cover both parametric and non-parametric modelling schemes.The resulting estimators have closed forms and are shown to be consistent and asymptotically normal.Simulation studies indicate that the proposed estimators perform well for practical settings.An application to a mouse leukemia study is illustrated.展开更多
In this thesis,we establish non-linear wavelet density estimators and studying the asymptotic properties of the estimators with data missing at random when covariates are present.The outstanding advantage of non-linea...In this thesis,we establish non-linear wavelet density estimators and studying the asymptotic properties of the estimators with data missing at random when covariates are present.The outstanding advantage of non-linear wavelet method is estimating the unsoothed functions,however,the classical kernel estimation cannot do this work.At the same time,we study the larger sample properties of the ISE for hazard rate estimator.展开更多
In this article, to improve the doubly robust estimator, the nonlinear regression models with missing responses are studied. Based on the covariate balancing propensity score (CBPS), estimators for the regression coef...In this article, to improve the doubly robust estimator, the nonlinear regression models with missing responses are studied. Based on the covariate balancing propensity score (CBPS), estimators for the regression coefficients and the population mean are obtained. It is proved that the proposed estimators are asymptotically normal. In simulation studies, the proposed estimators show improved performance relative to usual augmented inverse probability weighted estimators.展开更多
In this study, we investigate the effects of missing data when estimating HIV/TB co-infection. We revisit the concept of missing data and examine three available approaches for dealing with missingness. The main objec...In this study, we investigate the effects of missing data when estimating HIV/TB co-infection. We revisit the concept of missing data and examine three available approaches for dealing with missingness. The main objective is to identify the best method for correcting missing data in TB/HIV Co-infection setting. We employ both empirical data analysis and extensive simulation study to examine the effects of missing data, the accuracy, sensitivity, specificity and train and test error for different approaches. The novelty of this work hinges on the use of modern statistical learning algorithm when treating missingness. In the empirical analysis, both HIV data and TB-HIV co-infection data imputations were performed, and the missing values were imputed using different approaches. In the simulation study, sets of 0% (Complete case), 10%, 30%, 50% and 80% of the data were drawn randomly and replaced with missing values. Results show complete cases only had a co-infection rate (95% Confidence Interval band) of 29% (25%, 33%), weighted method 27% (23%, 31%), likelihood-based approach 26% (24%, 28%) and multiple imputation approach 21% (20%, 22%). In conclusion, MI remains the best approach for dealing with missing data and failure to apply it, results to overestimation of HIV/TB co-infection rate by 8%.展开更多
In this paper, three smoothed empirical log-likelihood ratio functions for the parameters of nonlinear models with missing response are suggested. Under some regular conditions, the corresponding Wilks phenomena are o...In this paper, three smoothed empirical log-likelihood ratio functions for the parameters of nonlinear models with missing response are suggested. Under some regular conditions, the corresponding Wilks phenomena are obtained and the confidence regions for the parameter can be constructed easily.展开更多
This paper considers two estimators of θ= g(x) in a nonparametric regression model Y = g(x) + ε(x∈ (0, 1)p) with missing responses: Imputation and inverse probability weighted esti- mators. Asymptotic nor...This paper considers two estimators of θ= g(x) in a nonparametric regression model Y = g(x) + ε(x∈ (0, 1)p) with missing responses: Imputation and inverse probability weighted esti- mators. Asymptotic normality of the two estimators is established, which is used to construct normal approximation based confidence intervals on θ.展开更多
In this article, empirical likelihood inference for estimating equation with missing data is considered. Based on the weighted-corrected estimating function, an empirical log-likelihood ratio is proved to be a standar...In this article, empirical likelihood inference for estimating equation with missing data is considered. Based on the weighted-corrected estimating function, an empirical log-likelihood ratio is proved to be a standard chiqsquare distribution asymptotically under some suitable conditions. This result is different from those derived before. So it is convenient to construct confidence regions for the parameters of interest. We also prove that our proposed maximum empirical likelihood estimator θ is asymptotically normal and attains the semiparametric efficiency bound of missing data. Some simulations indicate that the proposed method performs the best.展开更多
Multiply robust inference has attracted much attention recently in the context of missing response data. An estimation procedure is multiply robust, if it can incorporate information from multiple candidate models, an...Multiply robust inference has attracted much attention recently in the context of missing response data. An estimation procedure is multiply robust, if it can incorporate information from multiple candidate models, and meanwhile the resulting estimator is consistent as long as one of the candidate models is correctly specified. This property is appealing, since it provides the user a flexible modeling strategy with better protection against model misspecification. We explore this attractive property for the regression models with a binary covariate that is missing at random. We start from a reformulation of the celebrated augmented inverse probability weighted estimating equation, and based on this reformulation, we propose a novel combination of the least squares and empirical likelihood to separately handle each of the two types of multiple candidate models,one for the missing variable regression and the other for the missingness mechanism. Due to the separation, all the working models are fused concisely and effectively. The asymptotic normality of our estimator is established through the theory of estimating function with plugged-in nuisance parameter estimates. The finite-sample performance of our procedure is illustrated both through the simulation studies and the analysis of a dementia data collected by the national Alzheimer's coordinating center.展开更多
This paper discusses regression analysis of right-censored failure time data when censoring indicators are missing for some subjects. Several methods have been developed for the analysis under different situations and...This paper discusses regression analysis of right-censored failure time data when censoring indicators are missing for some subjects. Several methods have been developed for the analysis under different situations and especially, Goetghebeur and Ryan considered the situation where both the failure time and the censoring time follow the proportional hazards models marginally and developed an estimating equation approach. One limitation of their approach is that the two baseline hazard functions were assumed to be proportional to each other. We consider the same problem and present an efficient estimation procedure for regression parameters that does not require the proportionality assumption. An EM algorithm is developed and the method is evaluated by a simulation study, which indicates that the proposed methodology performs well for practical situations. An illustrative example is provided.展开更多
We,in this paper,investigate two-sample quantile difference by empirical likelihood method when the responses with high-dimensional covariates of the two populations are missing at random.In particular,based on suffic...We,in this paper,investigate two-sample quantile difference by empirical likelihood method when the responses with high-dimensional covariates of the two populations are missing at random.In particular,based on sufficient dimension reduction technique,we construct three empirical log-likelihood ratios for the quantile difference between two samples by using inverse probability weighting imputation,regression imputation as well as augmented inverse probability weighting imputation,respectively,and prove their asymptotic distributions.At the same time,we give a test to check whether two populations have the same distribution.A simulation study is carried out to investigate finite sample behavior of the proposed methods too.展开更多
Oonsider two linear models Xi = U'β + ei, Yj = V1/2y + ηj with response variables missing at random. In this paper, we assume that X, Y are missing at random (MAR) and use the inverse probability weighted imput...Oonsider two linear models Xi = U'β + ei, Yj = V1/2y + ηj with response variables missing at random. In this paper, we assume that X, Y are missing at random (MAR) and use the inverse probability weighted imputation to produce 'complete' data sets for X and Y. Based on these data sets, we construct an empirical likelihood (EL) statistic for the difference of X and Y (denoted as A), and show that the EL statistic has the limiting distribution of X~, which is used to construct a confidence interval for A. Results of a simulation study on the finite sample performance of EL-based confidence intervals on A are reported.展开更多
The problem of hazard rate estimation under right-censored assumption has been investigated extensively.Integrated square error(ISE)of estimation is one of the most widely accepted measurements of the global performan...The problem of hazard rate estimation under right-censored assumption has been investigated extensively.Integrated square error(ISE)of estimation is one of the most widely accepted measurements of the global performance for nonparametric kernel estimation.But there are no results available for ISE of hazard rate estimation under right-censored model with censoring indicators missing at random(MAR)so far.This paper constructs an imputation estimator of the hazard rate function and establish asymptotic normality of the ISE for the kernel hazard rate estimator with censoring indicators MAR.At the same time,an asymptotic representation of the mean integrated square error(MISE)is also presented.The finite sample behavior of the estimator is investigated via one simple simulation.展开更多
Suppose that we have a partially linear model Yi = xiβ + g(ti) +εi with independent zero mean errors εi, where (xi,ti, i = 1, ... ,n} are non-random and observed completely and (Yi, i = 1,...,n} are missing a...Suppose that we have a partially linear model Yi = xiβ + g(ti) +εi with independent zero mean errors εi, where (xi,ti, i = 1, ... ,n} are non-random and observed completely and (Yi, i = 1,...,n} are missing at random(MAR). Two types of estimators of β and g(t) for fixed t are investigated: estimators based on semiparametric regression and inverse probability weighted imputations. Asymptotic normality of the estimators is established, which is used to construct normal approximation based confidence intervals on β and g(t). Results are reported of a simulation study on the finite sample performance of the estimators and confidence intervals proposed in this paper.展开更多
文摘In this paper, a model averaging method is proposed for varying-coefficient models with response missing at random by establishing a weight selection criterion based on cross-validation. Under certain regularity conditions, it is proved that the proposed method is asymptotically optimal in the sense of achieving the minimum squared error.
基金supported by the Fundamental Research Funds for the Central Universities(17CX02035A)supported by NNSF of China(11601197,11461029,61563018)+2 种基金China Postdoctoral Science Foundation funded project(2016M600511,2017T100475)NSF of Jiangxi Province(20171ACB21030,20161BAB201024,20161ACB200009)the Key Science Fund Project of Jiangxi provincial education department(GJJ150439)
文摘It is known that conditional independence is a quite basic assumption in many fields of statistics. How to test its validity is of great importance and has been extensively studied by the literature. Nevertheless, all of the existing methods focus on the case that data are fully observed, but none of them seems having taken into account of the scenario when missing data are present. Motivated by this, this paper develops two testing statistics to handle such a situation relying on the idea of inverse probability weighted and augmented inverse probability weighted techniques. The asymptotic distributions of the proposed statistics are also derived under the null hypothesis. The simulation studies indicate that both testing statistics perform well in terms of size and power.
基金supported by the National Natural Science Foundation of China under Grant Nos.11871287,11501208,11771144,11801359the Natural Science Foundation of Tianjin under Grant No.18JCYBJC41100+1 种基金Fundamental Research Funds for the Central Universitiesthe Key Laboratory for Medical Data Analysis and Statistical Research of Tianjin。
文摘Empirical-likelihood-based inference for parameters defined by the general estimating equations of Qin and Lawless(1994) remains an active research topic. When the response is missing at random(MAR) and the dimension of covariate is not low, the authors propose a two-stage estimation procedure by using the dimension-reduced kernel estimators in conjunction with an unbiased estimating function based on augmented inverse probability weighting and multiple imputation(AIPW-MI) methods. The authors show that the resulting estimator achieves consistency and asymptotic normality. In addition, the corresponding empirical likelihood ratio statistics asymptotically follow central chi-square distributions when evaluated at the true parameter. The finite-sample performance of the proposed estimator is studied through simulation, and an application to HIV-CD4 data set is also presented.
基金supported in part by the National Social Science Foundation of China(Grant No.20BTJ049).
文摘In this paper,we consider the weighted local polynomial calibration estimation and imputation estimation of a non-parametric function when the data are right censored and the censoring indicators are missing at random,and establish the asymptotic normality of these estimators.As their applications,we derive the weighted local linear calibration estimators and imputation estimations of the conditional distribution function,the conditional density function and the conditional quantile function,and investigate the asymptotic normality of these estimators.Finally,the simulation studies are conducted to illustrate the finite sample performance of the estimators.
基金supported by National Natural Science Foundation of China(Grant Nos.11771431,11690015,11926341,11601080 and 11671275)Key Laboratory of Random Complex Structures and Data Science,Chinese Academy of Sciences(Grant No.2008DP173182)the Fundamental Research Funds for the Central Universities in University of International Business and Economics(Grant No.CXTD10-09)。
文摘Missing covariate data arise frequently in biomedical studies.In this article,we propose a class of weighted estimating equations for the additive hazards regression model when some of the covariates are missing at random.Time-specific and subject-specific weights are incorporated into the formulation of weighted estimating equations.Unified results are established for estimating selection probabilities that cover both parametric and non-parametric modelling schemes.The resulting estimators have closed forms and are shown to be consistent and asymptotically normal.Simulation studies indicate that the proposed estimators perform well for practical settings.An application to a mouse leukemia study is illustrated.
文摘In this thesis,we establish non-linear wavelet density estimators and studying the asymptotic properties of the estimators with data missing at random when covariates are present.The outstanding advantage of non-linear wavelet method is estimating the unsoothed functions,however,the classical kernel estimation cannot do this work.At the same time,we study the larger sample properties of the ISE for hazard rate estimator.
文摘In this article, to improve the doubly robust estimator, the nonlinear regression models with missing responses are studied. Based on the covariate balancing propensity score (CBPS), estimators for the regression coefficients and the population mean are obtained. It is proved that the proposed estimators are asymptotically normal. In simulation studies, the proposed estimators show improved performance relative to usual augmented inverse probability weighted estimators.
文摘In this study, we investigate the effects of missing data when estimating HIV/TB co-infection. We revisit the concept of missing data and examine three available approaches for dealing with missingness. The main objective is to identify the best method for correcting missing data in TB/HIV Co-infection setting. We employ both empirical data analysis and extensive simulation study to examine the effects of missing data, the accuracy, sensitivity, specificity and train and test error for different approaches. The novelty of this work hinges on the use of modern statistical learning algorithm when treating missingness. In the empirical analysis, both HIV data and TB-HIV co-infection data imputations were performed, and the missing values were imputed using different approaches. In the simulation study, sets of 0% (Complete case), 10%, 30%, 50% and 80% of the data were drawn randomly and replaced with missing values. Results show complete cases only had a co-infection rate (95% Confidence Interval band) of 29% (25%, 33%), weighted method 27% (23%, 31%), likelihood-based approach 26% (24%, 28%) and multiple imputation approach 21% (20%, 22%). In conclusion, MI remains the best approach for dealing with missing data and failure to apply it, results to overestimation of HIV/TB co-infection rate by 8%.
文摘In this paper, three smoothed empirical log-likelihood ratio functions for the parameters of nonlinear models with missing response are suggested. Under some regular conditions, the corresponding Wilks phenomena are obtained and the confidence regions for the parameter can be constructed easily.
基金This research is supported by he National Natural Science Foundation of China under Grant Nos. 10661003 and 10971038, and the Natural Science Foundation of Guangxi under Grant No. 2010GXNSFA013117.
文摘This paper considers two estimators of θ= g(x) in a nonparametric regression model Y = g(x) + ε(x∈ (0, 1)p) with missing responses: Imputation and inverse probability weighted esti- mators. Asymptotic normality of the two estimators is established, which is used to construct normal approximation based confidence intervals on θ.
基金supported by National Natural Science Foundation of China (Grant Nos.11171188, 11201499 and 10921101)Natural Science Foundation of Shandong Province (Grant Nos. ZR2010AZ001 and ZR2011AQ007)+1 种基金Shandong Provincial Scientific Research Reward Foundation for Excellent Young and MiddleAged Scientists (Grant No. BS2011SF006)K.C. Wong-HKBU Fellowship Program for Mainland Visiting Scholars 2010-11
文摘In this article, empirical likelihood inference for estimating equation with missing data is considered. Based on the weighted-corrected estimating function, an empirical log-likelihood ratio is proved to be a standard chiqsquare distribution asymptotically under some suitable conditions. This result is different from those derived before. So it is convenient to construct confidence regions for the parameters of interest. We also prove that our proposed maximum empirical likelihood estimator θ is asymptotically normal and attains the semiparametric efficiency bound of missing data. Some simulations indicate that the proposed method performs the best.
基金supported by National Natural Science Foundation of China(Grant No.11301031)
文摘Multiply robust inference has attracted much attention recently in the context of missing response data. An estimation procedure is multiply robust, if it can incorporate information from multiple candidate models, and meanwhile the resulting estimator is consistent as long as one of the candidate models is correctly specified. This property is appealing, since it provides the user a flexible modeling strategy with better protection against model misspecification. We explore this attractive property for the regression models with a binary covariate that is missing at random. We start from a reformulation of the celebrated augmented inverse probability weighted estimating equation, and based on this reformulation, we propose a novel combination of the least squares and empirical likelihood to separately handle each of the two types of multiple candidate models,one for the missing variable regression and the other for the missingness mechanism. Due to the separation, all the working models are fused concisely and effectively. The asymptotic normality of our estimator is established through the theory of estimating function with plugged-in nuisance parameter estimates. The finite-sample performance of our procedure is illustrated both through the simulation studies and the analysis of a dementia data collected by the national Alzheimer's coordinating center.
文摘This paper discusses regression analysis of right-censored failure time data when censoring indicators are missing for some subjects. Several methods have been developed for the analysis under different situations and especially, Goetghebeur and Ryan considered the situation where both the failure time and the censoring time follow the proportional hazards models marginally and developed an estimating equation approach. One limitation of their approach is that the two baseline hazard functions were assumed to be proportional to each other. We consider the same problem and present an efficient estimation procedure for regression parameters that does not require the proportionality assumption. An EM algorithm is developed and the method is evaluated by a simulation study, which indicates that the proposed methodology performs well for practical situations. An illustrative example is provided.
基金Supported by National Natural Science Foundation of China(Grant No.12071348)National Social Science Foundation of China(Grant No.17BTJ032)。
文摘We,in this paper,investigate two-sample quantile difference by empirical likelihood method when the responses with high-dimensional covariates of the two populations are missing at random.In particular,based on sufficient dimension reduction technique,we construct three empirical log-likelihood ratios for the quantile difference between two samples by using inverse probability weighting imputation,regression imputation as well as augmented inverse probability weighting imputation,respectively,and prove their asymptotic distributions.At the same time,we give a test to check whether two populations have the same distribution.A simulation study is carried out to investigate finite sample behavior of the proposed methods too.
基金Supported by the National Natural Science Foundation of China(No.11271088,11361011,11201088)Natural Science Foundation of Guangxi(No.2013GXNSFAA(019004 and 019007),2013GXNSFBA019001)
文摘Oonsider two linear models Xi = U'β + ei, Yj = V1/2y + ηj with response variables missing at random. In this paper, we assume that X, Y are missing at random (MAR) and use the inverse probability weighted imputation to produce 'complete' data sets for X and Y. Based on these data sets, we construct an empirical likelihood (EL) statistic for the difference of X and Y (denoted as A), and show that the EL statistic has the limiting distribution of X~, which is used to construct a confidence interval for A. Results of a simulation study on the finite sample performance of EL-based confidence intervals on A are reported.
基金the China Postdoctoral Science Foundation under Grant No.2019M651422the National Natural Science Foundation of China under Grant Nos.71701127,11831008 and 11971171+3 种基金the National Social Science Foundation Key Program under Grant No.17ZDA091the 111 Project of China under Grant No.B14019the Natural Science Foundation of Shanghai under Grant Nos.17ZR1409000 and 20ZR1423000the Project of Humanities and Social Science Foundation of Ministry of Education under Grant No.20YJC910003。
文摘The problem of hazard rate estimation under right-censored assumption has been investigated extensively.Integrated square error(ISE)of estimation is one of the most widely accepted measurements of the global performance for nonparametric kernel estimation.But there are no results available for ISE of hazard rate estimation under right-censored model with censoring indicators missing at random(MAR)so far.This paper constructs an imputation estimator of the hazard rate function and establish asymptotic normality of the ISE for the kernel hazard rate estimator with censoring indicators MAR.At the same time,an asymptotic representation of the mean integrated square error(MISE)is also presented.The finite sample behavior of the estimator is investigated via one simple simulation.
基金Supported by the National Natural Science Foundation of China(No.11271088,11361011,11201088)Guangxi"Bagui Scholar"Special Project Foundationthe Natural Science Foundation of Guangxi(No.2013GXNS-FAA019004,2013GXNSFAA019007,2013GXNSFBA019001)
文摘Suppose that we have a partially linear model Yi = xiβ + g(ti) +εi with independent zero mean errors εi, where (xi,ti, i = 1, ... ,n} are non-random and observed completely and (Yi, i = 1,...,n} are missing at random(MAR). Two types of estimators of β and g(t) for fixed t are investigated: estimators based on semiparametric regression and inverse probability weighted imputations. Asymptotic normality of the estimators is established, which is used to construct normal approximation based confidence intervals on β and g(t). Results are reported of a simulation study on the finite sample performance of the estimators and confidence intervals proposed in this paper.