期刊文献+
共找到9篇文章
< 1 >
每页显示 20 50 100
Comparative Variance and Multiple Imputation Used for Missing Values in Land Price DataSet 被引量:1
1
作者 Longqing Zhang Xinwei Zhang +2 位作者 Liping Bai Yanghong Zhang Feng Sun Changcheng Chen 《Computers, Materials & Continua》 SCIE EI 2019年第9期1175-1187,共13页
Based on the two-dimensional relation table,this paper studies the missing values in the sample data of land price of Shunde District of Foshan City.GeoDa software was used to eliminate the insignificant factors by st... Based on the two-dimensional relation table,this paper studies the missing values in the sample data of land price of Shunde District of Foshan City.GeoDa software was used to eliminate the insignificant factors by stepwise regression analysis;NORM software was adopted to construct the multiple imputation models;EM algorithm and the augmentation algorithm were applied to fit multiple linear regression equations to construct five different filling datasets.Statistical analysis is performed on the imputation data set in order to calculate the mean and variance of each data set,and the weight is determined according to the differences.Finally,comprehensive integration is implemented to achieve the imputation expression of missing values.The results showed that in the three missing cases where the PRICE variable was missing and the deletion rate was 5%,the PRICE variable was missing and the deletion rate was 10%,and the PRICE variable and the CBD variable were both missing.The new method compared to the traditional multiple filling methods of true value closer ratio is 75%to 25%,62.5%to 37.5%,100%to 0%.Therefore,the new method is obviously better than the traditional multiple imputation methods,and the missing value data estimated by the new method bears certain reference value. 展开更多
关键词 imputation method multiple imputations probabilistic model
下载PDF
Why Can Multiple Imputations and How (MICE) Algorithm Work?
2
作者 Abdullah Z. Alruhaymi Charles J. Kim 《Open Journal of Statistics》 2021年第5期759-777,共19页
Multiple imputations compensate for missing data and produce multiple datasets by regression model and are considered the solver of the old problem of univariate imputation. The univariate imputes data only from a spe... Multiple imputations compensate for missing data and produce multiple datasets by regression model and are considered the solver of the old problem of univariate imputation. The univariate imputes data only from a specific column where the data cell was missing. Multivariate imputation works simultaneously, with all variables in all columns, whether missing or observed. It has emerged as a principal method of solving missing data problems. All incomplete datasets analyzed before Multiple Imputation by Chained Equations <span style="font-family:Verdana;">(MICE) presented were misdiagnosed;results obtained were invalid and should</span><span style="font-family:Verdana;"> not be countable to yield reasonable conclusions. This article will highlight why multiple imputations and how the MICE work with a particular focus on the cyber-security dataset.</span><b> </b><span style="font-family:Verdana;">Removing missing data in any dataset and replac</span><span style="font-family:Verdana;">ing it is imperative in analyzing the data and creating prediction models. Therefore,</span><span style="font-family:Verdana;"> a good imputation technique should recover the missingness, which involves extracting the good features. However, the widely used univariate imputation method does not impute missingness reasonably if the values are too large and may thus lead to bias. Therefore, we aim to propose an alternative imputation method that is efficient and removes potential bias after removing the missingness.</span> 展开更多
关键词 multiple imputations imputations ALGORITHMS MICE Algorithm
下载PDF
Multiple Imputation of Missing Data:A Simulation Study on a Binary Response
3
作者 Jochen Hardt Max Herke +1 位作者 Tamara Brian Wilfried Laubach 《Open Journal of Statistics》 2013年第5期370-378,共9页
Currently, a growing number of programs become available in statistical software for multiple imputation of missing values. Among others, two algorithms are mainly implemented: Expectation Maximization (EM) and Multip... Currently, a growing number of programs become available in statistical software for multiple imputation of missing values. Among others, two algorithms are mainly implemented: Expectation Maximization (EM) and Multiple Imputation by Chained Equations (MICE). They have been shown to work well in large samples or when only small proportions of missing data are to be imputed. However, some researchers have begun to impute large proportions of missing data or to apply the method to small samples. A simulation was performed using MICE on datasets with 50, 100 or 200 cases and four or eleven variables. A varying proportion of data (3% - 63%) was set as missing completely at random and subsequently substituted using multiple imputation by chained equations. In a logistic regression model, four coefficients, i.e. non-zero and zero main effects as well as non-zero and zero interaction effects were examined. Estimations of all main and interaction effects were unbiased. There was a considerable variance in the estimates, increasing with the proportion of missing data and decreasing with sample size. The imputation of missing data by chained equations is a useful tool for imputing small to moderate proportions of missing data. The method has its limits, however. In small samples, there are considerable random errors for all effects. 展开更多
关键词 multiple imputation Chained Equation Large Proportion Missing Main Effect Interaction Effect
下载PDF
An Efficient Multiple Imputation Approach for Estimating Equations with Response Missing at Random and High-Dimensional Covariates 被引量:1
4
作者 WANG Lei SUN Siying XIA Zheng 《Journal of Systems Science & Complexity》 SCIE EI CSCD 2021年第1期440-464,共25页
Empirical-likelihood-based inference for parameters defined by the general estimating equations of Qin and Lawless(1994) remains an active research topic. When the response is missing at random(MAR) and the dimension ... Empirical-likelihood-based inference for parameters defined by the general estimating equations of Qin and Lawless(1994) remains an active research topic. When the response is missing at random(MAR) and the dimension of covariate is not low, the authors propose a two-stage estimation procedure by using the dimension-reduced kernel estimators in conjunction with an unbiased estimating function based on augmented inverse probability weighting and multiple imputation(AIPW-MI) methods. The authors show that the resulting estimator achieves consistency and asymptotic normality. In addition, the corresponding empirical likelihood ratio statistics asymptotically follow central chi-square distributions when evaluated at the true parameter. The finite-sample performance of the proposed estimator is studied through simulation, and an application to HIV-CD4 data set is also presented. 展开更多
关键词 Consistency and asymptotic normality dimension reduction kernel-assisted missing at random multiple imputation
原文传递
Fraction of Missing Information (γ) at Different Missing Data Fractions in the 2012 NAMCS Physician Workflow Mail Survey
5
作者 Qiyuan Pan Rong Wei 《Applied Mathematics》 2016年第10期1057-1067,共11页
In his 1987 classic book on multiple imputation (MI), Rubin used the fraction of missing information, γ, to define the relative efficiency (RE) of MI as RE = (1 + γ/m)?1/2, where m is the number of imputations, lead... In his 1987 classic book on multiple imputation (MI), Rubin used the fraction of missing information, γ, to define the relative efficiency (RE) of MI as RE = (1 + γ/m)?1/2, where m is the number of imputations, leading to the conclusion that a small m (≤5) would be sufficient for MI. However, evidence has been accumulating that many more imputations are needed. Why would the apparently sufficient m deduced from the RE be actually too small? The answer may lie with γ. In this research, γ was determined at the fractions of missing data (δ) of 4%, 10%, 20%, and 29% using the 2012 Physician Workflow Mail Survey of the National Ambulatory Medical Care Survey (NAMCS). The γ values were strikingly small, ranging in the order of 10?6 to 0.01. As δ increased, γ usually increased but sometimes decreased. How the data were analysed had the dominating effects on γ, overshadowing the effect of δ. The results suggest that it is impossible to predict γ using δ and that it may not be appropriate to use the γ-based RE to determine sufficient m. 展开更多
关键词 multiple imputation Fraction of Missing Information (γ) Sufficient Number of imputations Missing Data NAMCS
下载PDF
Estimation of Durability of Profit of Small and Medium Enterprises by Statistical Matching
6
作者 Yukiko KURIHARA 《Journal of Mathematics and System Science》 2015年第5期173-182,共10页
This study computes the durability of Return on Assets (ROA) in small and medium enterprises from different sample datasets. Utilizing information from the Financial Statements Statistics of Corporations by Industry... This study computes the durability of Return on Assets (ROA) in small and medium enterprises from different sample datasets. Utilizing information from the Financial Statements Statistics of Corporations by Industry, it verifies the precision of correlation coefficients using the Non-iterative Bayesian-based Imputation (NIBAS) and multiple imputation method for all combinations of common variables with auxiliary files. The following are the three important findings of this paper. First, statistical matching estimates of higher precision can be obtained using key variable sets with higher canonical correlation coefficients. Second, even if the key variable sets have high canonical correlation coefficients, key variables that are correlated extremely strongly with target variables and have high kurtosis should not be used. Finally, using auxiliary flies can improve the precision of statistical matching estimates. Accordingly, the durability of ROA in small and medium enterprises is computed. The author finds that the series of ROA correlation fluctuates for smaller enterprises compared to larger ones, and thus, the vulnerability of ROA in small and medium enterprises can be clarified via statistical matching. 展开更多
关键词 Bayesian regression imputation multiple imputation Canonical correlation coefficient Sampling experiment.
下载PDF
Data augmentation for bias correction in mapping PM_(2.5) based on satellite retrievals and ground observations 被引量:1
7
作者 Tan Mi Die Tang +6 位作者 Jianbo Fu Wen Zeng Michael L.Grieneisen Zihang Zhou Fengju Jia Fumo Yang Yu Zhan 《Geoscience Frontiers》 SCIE CAS CSCD 2024年第1期17-28,共12页
As most air quality monitoring sites are in urban areas worldwide,machine learning models may produce substantial estimation bias in rural areas when deriving spatiotemporal distributions of air pollutants.The bias st... As most air quality monitoring sites are in urban areas worldwide,machine learning models may produce substantial estimation bias in rural areas when deriving spatiotemporal distributions of air pollutants.The bias stems from the issue of dataset shift,as the density distributions of predictor variables differ greatly between urban and rural areas.We propose a data-augmentation approach based on the multiple imputation by chained equations(MICE-DA)to remedy the dataset shift problem.Compared with the benchmark models,MICE-DA exhibits superior predictive performance in deriving the spatiotemporal distributions of hourly PM2.5 in the megacity(Chengdu)at the foot of the Tibetan Plateau,especially for correcting the estimation bias,with the mean bias decreasing from-3.4µg/m3 to-1.6µg/m3.As a complement to the holdout validation,the semi-variance results show that MICE-DA decently preserves the spatial autocorrelation pattern of PM2.5 over the study area.The essence of MICE-DA is strengthening the correlation between PM2.5 and aerosol optical depth(AOD)during the data augmentation.Consequently,the importance of AOD is largely enhanced for predicting PM2.5,and the summed relative importance value of the two satellite-retrieved AOD variables increases from 5.5%to 18.4%.This study resolved the puzzle that AOD exhibited relatively lower importance in local or regional studies.The results of this study can advance the utilization of satellite remote sensing in modeling air quality while drawing more attention to the common dataset shift problem in data-driven environmental research. 展开更多
关键词 Aerosol optical depth Dataset shift Spatiotemporal Distribution Air quality monitoring multiple imputation by chained equations
原文传递
Regression Analysis of Doubly Censored Data with a Cured Subgroup under a Class of Promotion Time Cure Models
8
作者 Min CAI Li Qun XIAO Shu Wei LI 《Acta Mathematica Sinica,English Series》 SCIE CSCD 2021年第6期835-853,共19页
In some situations,the failure time of interest is defined as the gap time between two related events and the observations on both event times can suffer either right or interval censoring.Such data are usually referr... In some situations,the failure time of interest is defined as the gap time between two related events and the observations on both event times can suffer either right or interval censoring.Such data are usually referred to as doubly censored data and frequently encountered in many clinical and observational studies.Additionally,there may also exist a cured subgroup in the whole population,which means that not every individual under study will experience the failure time of interest eventually.In this paper,we consider regression analysis of doubly censored data with a cured subgroup under a wide class of flexible transformation cure models.Specifically,we consider marginal likelihood estimation and develop a two-step approach by combining the multiple imputation and a new expectation-maximization(EM)algorithm for its implementation.The resulting estimators are shown to be consistent and asymptotically normal.The finite sample performance of the proposed method is investigated through simulation studies.The proposed method is also applied to a real dataset arising from an AIDS cohort study for illustration. 展开更多
关键词 Doubly censored data marginal likelihood EM algorithm multiple imputation transformation cure models
原文传递
Applying Gregory Johnson’s Concepts of Scalar Stress to Scale and Information Thresholds in Holocene Social Evolution
9
作者 Laura J.Ellyson 《Journal of Social Computing》 EI 2022年第1期38-56,共19页
Although Gregory Johnson’s models have influenced social theory in archaeology,few have applied or built upon these models to predict aspects of social organization,group size,or fissioning.Exceptions have been limit... Although Gregory Johnson’s models have influenced social theory in archaeology,few have applied or built upon these models to predict aspects of social organization,group size,or fissioning.Exceptions have been limited to small case studies.Recently,the relationship between a society’s scale and its information-processing capacities has been explored using the Seshat Databank.Here,I apply multiple-linear regression analysis to the Seshat data using Turchin and colleagues’9“complexity characteristics”(CCs)to further examine the relationship between the hierarchy CC and the remaining 8 CCs which include both aspects of a polity’s scale and aspects of what Kohler et al.call“collective computation”.The results support Johnson’s ideas that stratification will generally increase with increases in a polity’s scale(population,territory);however,stratification is also higher when polities increase their developments in information-processing variables such as texts. 展开更多
关键词 HIERARCHY scalar stress Seshat Databank multiple imputation multiple regression
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部