In this paper, a model averaging method is proposed for varying-coefficient models with response missing at random by establishing a weight selection criterion based on cross-validation. Under certain regularity condi...In this paper, a model averaging method is proposed for varying-coefficient models with response missing at random by establishing a weight selection criterion based on cross-validation. Under certain regularity conditions, it is proved that the proposed method is asymptotically optimal in the sense of achieving the minimum squared error.展开更多
It is known that conditional independence is a quite basic assumption in many fields of statistics. How to test its validity is of great importance and has been extensively studied by the literature. Nevertheless, all...It is known that conditional independence is a quite basic assumption in many fields of statistics. How to test its validity is of great importance and has been extensively studied by the literature. Nevertheless, all of the existing methods focus on the case that data are fully observed, but none of them seems having taken into account of the scenario when missing data are present. Motivated by this, this paper develops two testing statistics to handle such a situation relying on the idea of inverse probability weighted and augmented inverse probability weighted techniques. The asymptotic distributions of the proposed statistics are also derived under the null hypothesis. The simulation studies indicate that both testing statistics perform well in terms of size and power.展开更多
Empirical-likelihood-based inference for parameters defined by the general estimating equations of Qin and Lawless(1994) remains an active research topic. When the response is missing at random(MAR) and the dimension ...Empirical-likelihood-based inference for parameters defined by the general estimating equations of Qin and Lawless(1994) remains an active research topic. When the response is missing at random(MAR) and the dimension of covariate is not low, the authors propose a two-stage estimation procedure by using the dimension-reduced kernel estimators in conjunction with an unbiased estimating function based on augmented inverse probability weighting and multiple imputation(AIPW-MI) methods. The authors show that the resulting estimator achieves consistency and asymptotic normality. In addition, the corresponding empirical likelihood ratio statistics asymptotically follow central chi-square distributions when evaluated at the true parameter. The finite-sample performance of the proposed estimator is studied through simulation, and an application to HIV-CD4 data set is also presented.展开更多
Missing covariate data arise frequently in biomedical studies.In this article,we propose a class of weighted estimating equations for the additive hazards regression model when some of the covariates are missing at ra...Missing covariate data arise frequently in biomedical studies.In this article,we propose a class of weighted estimating equations for the additive hazards regression model when some of the covariates are missing at random.Time-specific and subject-specific weights are incorporated into the formulation of weighted estimating equations.Unified results are established for estimating selection probabilities that cover both parametric and non-parametric modelling schemes.The resulting estimators have closed forms and are shown to be consistent and asymptotically normal.Simulation studies indicate that the proposed estimators perform well for practical settings.An application to a mouse leukemia study is illustrated.展开更多
In this paper,we consider the weighted local polynomial calibration estimation and imputation estimation of a non-parametric function when the data are right censored and the censoring indicators are missing at random...In this paper,we consider the weighted local polynomial calibration estimation and imputation estimation of a non-parametric function when the data are right censored and the censoring indicators are missing at random,and establish the asymptotic normality of these estimators.As their applications,we derive the weighted local linear calibration estimators and imputation estimations of the conditional distribution function,the conditional density function and the conditional quantile function,and investigate the asymptotic normality of these estimators.Finally,the simulation studies are conducted to illustrate the finite sample performance of the estimators.展开更多
In this thesis,we establish non-linear wavelet density estimators and studying the asymptotic properties of the estimators with data missing at random when covariates are present.The outstanding advantage of non-linea...In this thesis,we establish non-linear wavelet density estimators and studying the asymptotic properties of the estimators with data missing at random when covariates are present.The outstanding advantage of non-linear wavelet method is estimating the unsoothed functions,however,the classical kernel estimation cannot do this work.At the same time,we study the larger sample properties of the ISE for hazard rate estimator.展开更多
In this article, to improve the doubly robust estimator, the nonlinear regression models with missing responses are studied. Based on the covariate balancing propensity score (CBPS), estimators for the regression coef...In this article, to improve the doubly robust estimator, the nonlinear regression models with missing responses are studied. Based on the covariate balancing propensity score (CBPS), estimators for the regression coefficients and the population mean are obtained. It is proved that the proposed estimators are asymptotically normal. In simulation studies, the proposed estimators show improved performance relative to usual augmented inverse probability weighted estimators.展开更多
In this study, we investigate the effects of missing data when estimating HIV/TB co-infection. We revisit the concept of missing data and examine three available approaches for dealing with missingness. The main objec...In this study, we investigate the effects of missing data when estimating HIV/TB co-infection. We revisit the concept of missing data and examine three available approaches for dealing with missingness. The main objective is to identify the best method for correcting missing data in TB/HIV Co-infection setting. We employ both empirical data analysis and extensive simulation study to examine the effects of missing data, the accuracy, sensitivity, specificity and train and test error for different approaches. The novelty of this work hinges on the use of modern statistical learning algorithm when treating missingness. In the empirical analysis, both HIV data and TB-HIV co-infection data imputations were performed, and the missing values were imputed using different approaches. In the simulation study, sets of 0% (Complete case), 10%, 30%, 50% and 80% of the data were drawn randomly and replaced with missing values. Results show complete cases only had a co-infection rate (95% Confidence Interval band) of 29% (25%, 33%), weighted method 27% (23%, 31%), likelihood-based approach 26% (24%, 28%) and multiple imputation approach 21% (20%, 22%). In conclusion, MI remains the best approach for dealing with missing data and failure to apply it, results to overestimation of HIV/TB co-infection rate by 8%.展开更多
In this paper, three smoothed empirical log-likelihood ratio functions for the parameters of nonlinear models with missing response are suggested. Under some regular conditions, the corresponding Wilks phenomena are o...In this paper, three smoothed empirical log-likelihood ratio functions for the parameters of nonlinear models with missing response are suggested. Under some regular conditions, the corresponding Wilks phenomena are obtained and the confidence regions for the parameter can be constructed easily.展开更多
This paper discusses regression analysis of right-censored failure time data when censoring indicators are missing for some subjects. Several methods have been developed for the analysis under different situations and...This paper discusses regression analysis of right-censored failure time data when censoring indicators are missing for some subjects. Several methods have been developed for the analysis under different situations and especially, Goetghebeur and Ryan considered the situation where both the failure time and the censoring time follow the proportional hazards models marginally and developed an estimating equation approach. One limitation of their approach is that the two baseline hazard functions were assumed to be proportional to each other. We consider the same problem and present an efficient estimation procedure for regression parameters that does not require the proportionality assumption. An EM algorithm is developed and the method is evaluated by a simulation study, which indicates that the proposed methodology performs well for practical situations. An illustrative example is provided.展开更多
We,in this paper,investigate two-sample quantile difference by empirical likelihood method when the responses with high-dimensional covariates of the two populations are missing at random.In particular,based on suffic...We,in this paper,investigate two-sample quantile difference by empirical likelihood method when the responses with high-dimensional covariates of the two populations are missing at random.In particular,based on sufficient dimension reduction technique,we construct three empirical log-likelihood ratios for the quantile difference between two samples by using inverse probability weighting imputation,regression imputation as well as augmented inverse probability weighting imputation,respectively,and prove their asymptotic distributions.At the same time,we give a test to check whether two populations have the same distribution.A simulation study is carried out to investigate finite sample behavior of the proposed methods too.展开更多
Oonsider two linear models Xi = U'β + ei, Yj = V1/2y + ηj with response variables missing at random. In this paper, we assume that X, Y are missing at random (MAR) and use the inverse probability weighted imput...Oonsider two linear models Xi = U'β + ei, Yj = V1/2y + ηj with response variables missing at random. In this paper, we assume that X, Y are missing at random (MAR) and use the inverse probability weighted imputation to produce 'complete' data sets for X and Y. Based on these data sets, we construct an empirical likelihood (EL) statistic for the difference of X and Y (denoted as A), and show that the EL statistic has the limiting distribution of X~, which is used to construct a confidence interval for A. Results of a simulation study on the finite sample performance of EL-based confidence intervals on A are reported.展开更多
The problem of hazard rate estimation under right-censored assumption has been investigated extensively.Integrated square error(ISE)of estimation is one of the most widely accepted measurements of the global performan...The problem of hazard rate estimation under right-censored assumption has been investigated extensively.Integrated square error(ISE)of estimation is one of the most widely accepted measurements of the global performance for nonparametric kernel estimation.But there are no results available for ISE of hazard rate estimation under right-censored model with censoring indicators missing at random(MAR)so far.This paper constructs an imputation estimator of the hazard rate function and establish asymptotic normality of the ISE for the kernel hazard rate estimator with censoring indicators MAR.At the same time,an asymptotic representation of the mean integrated square error(MISE)is also presented.The finite sample behavior of the estimator is investigated via one simple simulation.展开更多
Suppose that we have a partially linear model Yi = xiβ + g(ti) +εi with independent zero mean errors εi, where (xi,ti, i = 1, ... ,n} are non-random and observed completely and (Yi, i = 1,...,n} are missing a...Suppose that we have a partially linear model Yi = xiβ + g(ti) +εi with independent zero mean errors εi, where (xi,ti, i = 1, ... ,n} are non-random and observed completely and (Yi, i = 1,...,n} are missing at random(MAR). Two types of estimators of β and g(t) for fixed t are investigated: estimators based on semiparametric regression and inverse probability weighted imputations. Asymptotic normality of the estimators is established, which is used to construct normal approximation based confidence intervals on β and g(t). Results are reported of a simulation study on the finite sample performance of the estimators and confidence intervals proposed in this paper.展开更多
Feature screening with missing data is a critical problem but has not been well addressed in theliterature. In this discussion we propose a new screening index based on “information value” andapply it to feature scr...Feature screening with missing data is a critical problem but has not been well addressed in theliterature. In this discussion we propose a new screening index based on “information value” andapply it to feature screening with missing covariates.展开更多
In this paper, we consider the empirical likelihood-based inferences for varying coefficient models Y = X^τα(U) + ε when X are subject to missing at random. Based on the inverse probability-weighted idea, a clas...In this paper, we consider the empirical likelihood-based inferences for varying coefficient models Y = X^τα(U) + ε when X are subject to missing at random. Based on the inverse probability-weighted idea, a class of empirical log-likelihood ratios, as well as two maximum empirical likelihood estimators, are developed for α(u). The resulting statistics are shown to have standard chi-squared or normal distributions asymptotically.Simulation studies are also constructed to illustrate the finite sample properties of the proposed statistics.展开更多
Missing data and time-dependent covariates often arise simultaneously in longitudinal studies,and directly applying classical approaches may result in a loss of efficiency and biased estimates.To deal with this proble...Missing data and time-dependent covariates often arise simultaneously in longitudinal studies,and directly applying classical approaches may result in a loss of efficiency and biased estimates.To deal with this problem,we propose weighted corrected estimating equations under the missing at random mechanism,followed by developing a shrinkage empirical likelihood estimation approach for the parameters of interest when time-dependent covariates are present.Such procedure improves efficiency over generalized estimation equations approach with working independent assumption,via combining the independent estimating equations and the extracted additional information from the estimating equations that are excluded by the independence assumption.The contribution from the remaining estimating equations is weighted according to the likelihood of each equation being a consistent estimating equation and the information it carries.We show that the estimators are asymptotically normally distributed and the empirical likelihood ratio statistic and its profile counterpart follow central chi-square distributions asymptotically when evaluated at the true parameter.The practical performance of our approach is demonstrated through numerical simulations and data analysis.展开更多
Empirical likelihood (EL) ratio statistic on θ=g(x) is constructed based on the inverse probability weighted imputation approach in a nonparametric regression model Y = g(x) +ε (x ∈ [0, 1]p) with fixed des...Empirical likelihood (EL) ratio statistic on θ=g(x) is constructed based on the inverse probability weighted imputation approach in a nonparametric regression model Y = g(x) +ε (x ∈ [0, 1]p) with fixed designs and missing responses, which asymptotically has X1^2 distribution. This result is used to obtain a EL based confidence interval on θ.展开更多
文摘In this paper, a model averaging method is proposed for varying-coefficient models with response missing at random by establishing a weight selection criterion based on cross-validation. Under certain regularity conditions, it is proved that the proposed method is asymptotically optimal in the sense of achieving the minimum squared error.
基金supported by the Fundamental Research Funds for the Central Universities(17CX02035A)supported by NNSF of China(11601197,11461029,61563018)+2 种基金China Postdoctoral Science Foundation funded project(2016M600511,2017T100475)NSF of Jiangxi Province(20171ACB21030,20161BAB201024,20161ACB200009)the Key Science Fund Project of Jiangxi provincial education department(GJJ150439)
文摘It is known that conditional independence is a quite basic assumption in many fields of statistics. How to test its validity is of great importance and has been extensively studied by the literature. Nevertheless, all of the existing methods focus on the case that data are fully observed, but none of them seems having taken into account of the scenario when missing data are present. Motivated by this, this paper develops two testing statistics to handle such a situation relying on the idea of inverse probability weighted and augmented inverse probability weighted techniques. The asymptotic distributions of the proposed statistics are also derived under the null hypothesis. The simulation studies indicate that both testing statistics perform well in terms of size and power.
基金supported by the National Natural Science Foundation of China under Grant Nos.11871287,11501208,11771144,11801359the Natural Science Foundation of Tianjin under Grant No.18JCYBJC41100+1 种基金Fundamental Research Funds for the Central Universitiesthe Key Laboratory for Medical Data Analysis and Statistical Research of Tianjin。
文摘Empirical-likelihood-based inference for parameters defined by the general estimating equations of Qin and Lawless(1994) remains an active research topic. When the response is missing at random(MAR) and the dimension of covariate is not low, the authors propose a two-stage estimation procedure by using the dimension-reduced kernel estimators in conjunction with an unbiased estimating function based on augmented inverse probability weighting and multiple imputation(AIPW-MI) methods. The authors show that the resulting estimator achieves consistency and asymptotic normality. In addition, the corresponding empirical likelihood ratio statistics asymptotically follow central chi-square distributions when evaluated at the true parameter. The finite-sample performance of the proposed estimator is studied through simulation, and an application to HIV-CD4 data set is also presented.
基金supported by National Natural Science Foundation of China(Grant Nos.11771431,11690015,11926341,11601080 and 11671275)Key Laboratory of Random Complex Structures and Data Science,Chinese Academy of Sciences(Grant No.2008DP173182)the Fundamental Research Funds for the Central Universities in University of International Business and Economics(Grant No.CXTD10-09)。
文摘Missing covariate data arise frequently in biomedical studies.In this article,we propose a class of weighted estimating equations for the additive hazards regression model when some of the covariates are missing at random.Time-specific and subject-specific weights are incorporated into the formulation of weighted estimating equations.Unified results are established for estimating selection probabilities that cover both parametric and non-parametric modelling schemes.The resulting estimators have closed forms and are shown to be consistent and asymptotically normal.Simulation studies indicate that the proposed estimators perform well for practical settings.An application to a mouse leukemia study is illustrated.
基金supported in part by the National Social Science Foundation of China(Grant No.20BTJ049).
文摘In this paper,we consider the weighted local polynomial calibration estimation and imputation estimation of a non-parametric function when the data are right censored and the censoring indicators are missing at random,and establish the asymptotic normality of these estimators.As their applications,we derive the weighted local linear calibration estimators and imputation estimations of the conditional distribution function,the conditional density function and the conditional quantile function,and investigate the asymptotic normality of these estimators.Finally,the simulation studies are conducted to illustrate the finite sample performance of the estimators.
文摘In this thesis,we establish non-linear wavelet density estimators and studying the asymptotic properties of the estimators with data missing at random when covariates are present.The outstanding advantage of non-linear wavelet method is estimating the unsoothed functions,however,the classical kernel estimation cannot do this work.At the same time,we study the larger sample properties of the ISE for hazard rate estimator.
文摘In this article, to improve the doubly robust estimator, the nonlinear regression models with missing responses are studied. Based on the covariate balancing propensity score (CBPS), estimators for the regression coefficients and the population mean are obtained. It is proved that the proposed estimators are asymptotically normal. In simulation studies, the proposed estimators show improved performance relative to usual augmented inverse probability weighted estimators.
文摘In this study, we investigate the effects of missing data when estimating HIV/TB co-infection. We revisit the concept of missing data and examine three available approaches for dealing with missingness. The main objective is to identify the best method for correcting missing data in TB/HIV Co-infection setting. We employ both empirical data analysis and extensive simulation study to examine the effects of missing data, the accuracy, sensitivity, specificity and train and test error for different approaches. The novelty of this work hinges on the use of modern statistical learning algorithm when treating missingness. In the empirical analysis, both HIV data and TB-HIV co-infection data imputations were performed, and the missing values were imputed using different approaches. In the simulation study, sets of 0% (Complete case), 10%, 30%, 50% and 80% of the data were drawn randomly and replaced with missing values. Results show complete cases only had a co-infection rate (95% Confidence Interval band) of 29% (25%, 33%), weighted method 27% (23%, 31%), likelihood-based approach 26% (24%, 28%) and multiple imputation approach 21% (20%, 22%). In conclusion, MI remains the best approach for dealing with missing data and failure to apply it, results to overestimation of HIV/TB co-infection rate by 8%.
文摘In this paper, three smoothed empirical log-likelihood ratio functions for the parameters of nonlinear models with missing response are suggested. Under some regular conditions, the corresponding Wilks phenomena are obtained and the confidence regions for the parameter can be constructed easily.
文摘This paper discusses regression analysis of right-censored failure time data when censoring indicators are missing for some subjects. Several methods have been developed for the analysis under different situations and especially, Goetghebeur and Ryan considered the situation where both the failure time and the censoring time follow the proportional hazards models marginally and developed an estimating equation approach. One limitation of their approach is that the two baseline hazard functions were assumed to be proportional to each other. We consider the same problem and present an efficient estimation procedure for regression parameters that does not require the proportionality assumption. An EM algorithm is developed and the method is evaluated by a simulation study, which indicates that the proposed methodology performs well for practical situations. An illustrative example is provided.
基金Supported by National Natural Science Foundation of China(Grant No.12071348)National Social Science Foundation of China(Grant No.17BTJ032)。
文摘We,in this paper,investigate two-sample quantile difference by empirical likelihood method when the responses with high-dimensional covariates of the two populations are missing at random.In particular,based on sufficient dimension reduction technique,we construct three empirical log-likelihood ratios for the quantile difference between two samples by using inverse probability weighting imputation,regression imputation as well as augmented inverse probability weighting imputation,respectively,and prove their asymptotic distributions.At the same time,we give a test to check whether two populations have the same distribution.A simulation study is carried out to investigate finite sample behavior of the proposed methods too.
基金Supported by the National Natural Science Foundation of China(No.11271088,11361011,11201088)Natural Science Foundation of Guangxi(No.2013GXNSFAA(019004 and 019007),2013GXNSFBA019001)
文摘Oonsider two linear models Xi = U'β + ei, Yj = V1/2y + ηj with response variables missing at random. In this paper, we assume that X, Y are missing at random (MAR) and use the inverse probability weighted imputation to produce 'complete' data sets for X and Y. Based on these data sets, we construct an empirical likelihood (EL) statistic for the difference of X and Y (denoted as A), and show that the EL statistic has the limiting distribution of X~, which is used to construct a confidence interval for A. Results of a simulation study on the finite sample performance of EL-based confidence intervals on A are reported.
基金the China Postdoctoral Science Foundation under Grant No.2019M651422the National Natural Science Foundation of China under Grant Nos.71701127,11831008 and 11971171+3 种基金the National Social Science Foundation Key Program under Grant No.17ZDA091the 111 Project of China under Grant No.B14019the Natural Science Foundation of Shanghai under Grant Nos.17ZR1409000 and 20ZR1423000the Project of Humanities and Social Science Foundation of Ministry of Education under Grant No.20YJC910003。
文摘The problem of hazard rate estimation under right-censored assumption has been investigated extensively.Integrated square error(ISE)of estimation is one of the most widely accepted measurements of the global performance for nonparametric kernel estimation.But there are no results available for ISE of hazard rate estimation under right-censored model with censoring indicators missing at random(MAR)so far.This paper constructs an imputation estimator of the hazard rate function and establish asymptotic normality of the ISE for the kernel hazard rate estimator with censoring indicators MAR.At the same time,an asymptotic representation of the mean integrated square error(MISE)is also presented.The finite sample behavior of the estimator is investigated via one simple simulation.
基金Supported by the National Natural Science Foundation of China(No.11271088,11361011,11201088)Guangxi"Bagui Scholar"Special Project Foundationthe Natural Science Foundation of Guangxi(No.2013GXNS-FAA019004,2013GXNSFAA019007,2013GXNSFBA019001)
文摘Suppose that we have a partially linear model Yi = xiβ + g(ti) +εi with independent zero mean errors εi, where (xi,ti, i = 1, ... ,n} are non-random and observed completely and (Yi, i = 1,...,n} are missing at random(MAR). Two types of estimators of β and g(t) for fixed t are investigated: estimators based on semiparametric regression and inverse probability weighted imputations. Asymptotic normality of the estimators is established, which is used to construct normal approximation based confidence intervals on β and g(t). Results are reported of a simulation study on the finite sample performance of the estimators and confidence intervals proposed in this paper.
文摘Feature screening with missing data is a critical problem but has not been well addressed in theliterature. In this discussion we propose a new screening index based on “information value” andapply it to feature screening with missing covariates.
基金supported in part by NSF of China(No.11461029)NSF of Jiangxi Province(No.20142BAB211014)YSFP of Jiangxi provincial education department(No.GJJ14350)
文摘In this paper, we consider the empirical likelihood-based inferences for varying coefficient models Y = X^τα(U) + ε when X are subject to missing at random. Based on the inverse probability-weighted idea, a class of empirical log-likelihood ratios, as well as two maximum empirical likelihood estimators, are developed for α(u). The resulting statistics are shown to have standard chi-squared or normal distributions asymptotically.Simulation studies are also constructed to illustrate the finite sample properties of the proposed statistics.
基金supported by the NNSF of China(No.11271347)the Fundamental Research Funds for the Central Universities
文摘Missing data and time-dependent covariates often arise simultaneously in longitudinal studies,and directly applying classical approaches may result in a loss of efficiency and biased estimates.To deal with this problem,we propose weighted corrected estimating equations under the missing at random mechanism,followed by developing a shrinkage empirical likelihood estimation approach for the parameters of interest when time-dependent covariates are present.Such procedure improves efficiency over generalized estimation equations approach with working independent assumption,via combining the independent estimating equations and the extracted additional information from the estimating equations that are excluded by the independence assumption.The contribution from the remaining estimating equations is weighted according to the likelihood of each equation being a consistent estimating equation and the information it carries.We show that the estimators are asymptotically normally distributed and the empirical likelihood ratio statistic and its profile counterpart follow central chi-square distributions asymptotically when evaluated at the true parameter.The practical performance of our approach is demonstrated through numerical simulations and data analysis.
基金Supported by the National Natural Science Foundation of China (No.10971038)the Natural Science Foundation of Guangxi (No.2010GXNSFA013117)
文摘Empirical likelihood (EL) ratio statistic on θ=g(x) is constructed based on the inverse probability weighted imputation approach in a nonparametric regression model Y = g(x) +ε (x ∈ [0, 1]p) with fixed designs and missing responses, which asymptotically has X1^2 distribution. This result is used to obtain a EL based confidence interval on θ.