In this paper, a model averaging method is proposed for varying-coefficient models with response missing at random by establishing a weight selection criterion based on cross-validation. Under certain regularity condi...In this paper, a model averaging method is proposed for varying-coefficient models with response missing at random by establishing a weight selection criterion based on cross-validation. Under certain regularity conditions, it is proved that the proposed method is asymptotically optimal in the sense of achieving the minimum squared error.展开更多
It is known that conditional independence is a quite basic assumption in many fields of statistics. How to test its validity is of great importance and has been extensively studied by the literature. Nevertheless, all...It is known that conditional independence is a quite basic assumption in many fields of statistics. How to test its validity is of great importance and has been extensively studied by the literature. Nevertheless, all of the existing methods focus on the case that data are fully observed, but none of them seems having taken into account of the scenario when missing data are present. Motivated by this, this paper develops two testing statistics to handle such a situation relying on the idea of inverse probability weighted and augmented inverse probability weighted techniques. The asymptotic distributions of the proposed statistics are also derived under the null hypothesis. The simulation studies indicate that both testing statistics perform well in terms of size and power.展开更多
Many real-world datasets suffer from the unavoidable issue of missing values,and therefore classification with missing data has to be carefully handled since inadequate treatment of missing values will cause large err...Many real-world datasets suffer from the unavoidable issue of missing values,and therefore classification with missing data has to be carefully handled since inadequate treatment of missing values will cause large errors.In this paper,we propose a random subspace sampling method,RSS,by sampling missing items from the corresponding feature histogram distributions in random subspaces,which is effective and efficient at different levels of missing data.Unlike most established approaches,RSS does not train on fixed imputed datasets.Instead,we design a dynamic training strategy where the filled values change dynamically by resampling during training.Moreover,thanks to the sampling strategy,we design an ensemble testing strategy where we combine the results of multiple runs of a single model,which is more efficient and resource-saving than previous ensemble methods.Finally,we combine these two strategies with the random subspace method,which makes our estimations more robust and accurate.The effectiveness of the proposed RSS method is well validated by experimental studies.展开更多
On the assumption that random interruptions in the observation process are modeled by a sequence of independent Bernoulli random variables, we firstly generalize two kinds of nonlinear filtering methods with random in...On the assumption that random interruptions in the observation process are modeled by a sequence of independent Bernoulli random variables, we firstly generalize two kinds of nonlinear filtering methods with random interruption failures in the observation based on the extended Kalman filtering (EKF) and the unscented Kalman filtering (UKF), which were shortened as GEKF and CUKF in this paper, respectively. Then the nonlinear filtering model is established by using the radial basis function neural network (RBFNN) prototypes and the network weights as state equation and the output of RBFNN to present the observation equation. Finally, we take the filtering problem under missing observed data as a special case of nonlinear filtering with random intermittent failures by setting each missing data to be zero without needing to pre-estimate the missing data, and use the GEKF-based RBFNN and the GUKF-based RBFNN to predict the ground radioactivity time series with missing data. Experimental results demonstrate that the prediction results of GUKF-based RBFNN accord well with the real ground radioactivity time series while the prediction results of GEKF-based RBFNN are divergent.展开更多
In this article, to improve the doubly robust estimator, the nonlinear regression models with missing responses are studied. Based on the covariate balancing propensity score (CBPS), estimators for the regression coef...In this article, to improve the doubly robust estimator, the nonlinear regression models with missing responses are studied. Based on the covariate balancing propensity score (CBPS), estimators for the regression coefficients and the population mean are obtained. It is proved that the proposed estimators are asymptotically normal. In simulation studies, the proposed estimators show improved performance relative to usual augmented inverse probability weighted estimators.展开更多
In this study, we investigate the effects of missing data when estimating HIV/TB co-infection. We revisit the concept of missing data and examine three available approaches for dealing with missingness. The main objec...In this study, we investigate the effects of missing data when estimating HIV/TB co-infection. We revisit the concept of missing data and examine three available approaches for dealing with missingness. The main objective is to identify the best method for correcting missing data in TB/HIV Co-infection setting. We employ both empirical data analysis and extensive simulation study to examine the effects of missing data, the accuracy, sensitivity, specificity and train and test error for different approaches. The novelty of this work hinges on the use of modern statistical learning algorithm when treating missingness. In the empirical analysis, both HIV data and TB-HIV co-infection data imputations were performed, and the missing values were imputed using different approaches. In the simulation study, sets of 0% (Complete case), 10%, 30%, 50% and 80% of the data were drawn randomly and replaced with missing values. Results show complete cases only had a co-infection rate (95% Confidence Interval band) of 29% (25%, 33%), weighted method 27% (23%, 31%), likelihood-based approach 26% (24%, 28%) and multiple imputation approach 21% (20%, 22%). In conclusion, MI remains the best approach for dealing with missing data and failure to apply it, results to overestimation of HIV/TB co-infection rate by 8%.展开更多
In this paper, three smoothed empirical log-likelihood ratio functions for the parameters of nonlinear models with missing response are suggested. Under some regular conditions, the corresponding Wilks phenomena are o...In this paper, three smoothed empirical log-likelihood ratio functions for the parameters of nonlinear models with missing response are suggested. Under some regular conditions, the corresponding Wilks phenomena are obtained and the confidence regions for the parameter can be constructed easily.展开更多
文摘In this paper, a model averaging method is proposed for varying-coefficient models with response missing at random by establishing a weight selection criterion based on cross-validation. Under certain regularity conditions, it is proved that the proposed method is asymptotically optimal in the sense of achieving the minimum squared error.
基金supported by the Fundamental Research Funds for the Central Universities(17CX02035A)supported by NNSF of China(11601197,11461029,61563018)+2 种基金China Postdoctoral Science Foundation funded project(2016M600511,2017T100475)NSF of Jiangxi Province(20171ACB21030,20161BAB201024,20161ACB200009)the Key Science Fund Project of Jiangxi provincial education department(GJJ150439)
文摘It is known that conditional independence is a quite basic assumption in many fields of statistics. How to test its validity is of great importance and has been extensively studied by the literature. Nevertheless, all of the existing methods focus on the case that data are fully observed, but none of them seems having taken into account of the scenario when missing data are present. Motivated by this, this paper develops two testing statistics to handle such a situation relying on the idea of inverse probability weighted and augmented inverse probability weighted techniques. The asymptotic distributions of the proposed statistics are also derived under the null hypothesis. The simulation studies indicate that both testing statistics perform well in terms of size and power.
基金supported by the National Natural Science Foundation of China under Grant Nos.61772256 and 61921006.
文摘Many real-world datasets suffer from the unavoidable issue of missing values,and therefore classification with missing data has to be carefully handled since inadequate treatment of missing values will cause large errors.In this paper,we propose a random subspace sampling method,RSS,by sampling missing items from the corresponding feature histogram distributions in random subspaces,which is effective and efficient at different levels of missing data.Unlike most established approaches,RSS does not train on fixed imputed datasets.Instead,we design a dynamic training strategy where the filled values change dynamically by resampling during training.Moreover,thanks to the sampling strategy,we design an ensemble testing strategy where we combine the results of multiple runs of a single model,which is more efficient and resource-saving than previous ensemble methods.Finally,we combine these two strategies with the random subspace method,which makes our estimations more robust and accurate.The effectiveness of the proposed RSS method is well validated by experimental studies.
基金Project supported by the State Key Program of the National Natural Science of China (Grant No. 60835004)the Natural Science Foundation of Jiangsu Province of China (Grant No. BK2009727)+1 种基金the Natural Science Foundation of Higher Education Institutions of Jiangsu Province of China (Grant No. 10KJB510004)the National Natural Science Foundation of China (Grant No. 61075028)
文摘On the assumption that random interruptions in the observation process are modeled by a sequence of independent Bernoulli random variables, we firstly generalize two kinds of nonlinear filtering methods with random interruption failures in the observation based on the extended Kalman filtering (EKF) and the unscented Kalman filtering (UKF), which were shortened as GEKF and CUKF in this paper, respectively. Then the nonlinear filtering model is established by using the radial basis function neural network (RBFNN) prototypes and the network weights as state equation and the output of RBFNN to present the observation equation. Finally, we take the filtering problem under missing observed data as a special case of nonlinear filtering with random intermittent failures by setting each missing data to be zero without needing to pre-estimate the missing data, and use the GEKF-based RBFNN and the GUKF-based RBFNN to predict the ground radioactivity time series with missing data. Experimental results demonstrate that the prediction results of GUKF-based RBFNN accord well with the real ground radioactivity time series while the prediction results of GEKF-based RBFNN are divergent.
文摘In this article, to improve the doubly robust estimator, the nonlinear regression models with missing responses are studied. Based on the covariate balancing propensity score (CBPS), estimators for the regression coefficients and the population mean are obtained. It is proved that the proposed estimators are asymptotically normal. In simulation studies, the proposed estimators show improved performance relative to usual augmented inverse probability weighted estimators.
文摘In this study, we investigate the effects of missing data when estimating HIV/TB co-infection. We revisit the concept of missing data and examine three available approaches for dealing with missingness. The main objective is to identify the best method for correcting missing data in TB/HIV Co-infection setting. We employ both empirical data analysis and extensive simulation study to examine the effects of missing data, the accuracy, sensitivity, specificity and train and test error for different approaches. The novelty of this work hinges on the use of modern statistical learning algorithm when treating missingness. In the empirical analysis, both HIV data and TB-HIV co-infection data imputations were performed, and the missing values were imputed using different approaches. In the simulation study, sets of 0% (Complete case), 10%, 30%, 50% and 80% of the data were drawn randomly and replaced with missing values. Results show complete cases only had a co-infection rate (95% Confidence Interval band) of 29% (25%, 33%), weighted method 27% (23%, 31%), likelihood-based approach 26% (24%, 28%) and multiple imputation approach 21% (20%, 22%). In conclusion, MI remains the best approach for dealing with missing data and failure to apply it, results to overestimation of HIV/TB co-infection rate by 8%.
文摘In this paper, three smoothed empirical log-likelihood ratio functions for the parameters of nonlinear models with missing response are suggested. Under some regular conditions, the corresponding Wilks phenomena are obtained and the confidence regions for the parameter can be constructed easily.