期刊文献+
共找到1,437篇文章
< 1 2 72 >
每页显示 20 50 100
Optimal Estimation of High-Dimensional Covariance Matrices with Missing and Noisy Data
1
作者 Meiyin Wang Wanzhou Ye 《Advances in Pure Mathematics》 2024年第4期214-227,共14页
The estimation of covariance matrices is very important in many fields, such as statistics. In real applications, data are frequently influenced by high dimensions and noise. However, most relevant studies are based o... The estimation of covariance matrices is very important in many fields, such as statistics. In real applications, data are frequently influenced by high dimensions and noise. However, most relevant studies are based on complete data. This paper studies the optimal estimation of high-dimensional covariance matrices based on missing and noisy sample under the norm. First, the model with sub-Gaussian additive noise is presented. The generalized sample covariance is then modified to define a hard thresholding estimator , and the minimax upper bound is derived. After that, the minimax lower bound is derived, and it is concluded that the estimator presented in this article is rate-optimal. Finally, numerical simulation analysis is performed. The result shows that for missing samples with sub-Gaussian noise, if the true covariance matrix is sparse, the hard thresholding estimator outperforms the traditional estimate method. 展开更多
关键词 high-dimensional Covariance Matrix missing data Sub-Gaussian Noise Optimal Estimation
下载PDF
A Practical Approach for Missing Wireless Sensor Networks Data Recovery
2
作者 Song Xiaoxiang Guo Yan +1 位作者 Li Ning Ren Bing 《China Communications》 SCIE CSCD 2024年第5期202-217,共16页
In wireless sensor networks(WSNs),the performance of related applications is highly dependent on the quality of data collected.Unfortunately,missing data is almost inevitable in the process of data acquisition and tra... In wireless sensor networks(WSNs),the performance of related applications is highly dependent on the quality of data collected.Unfortunately,missing data is almost inevitable in the process of data acquisition and transmission.Existing methods often rely on prior information such as low-rank characteristics or spatiotemporal correlation when recovering missing WSNs data.However,in realistic application scenarios,it is very difficult to obtain these prior information from incomplete data sets.Therefore,we aim to recover the missing WSNs data effectively while getting rid of the perplexity of prior information.By designing the corresponding measurement matrix that can capture the position of missing data and sparse representation matrix,a compressive sensing(CS)based missing data recovery model is established.Then,we design a comparison standard to select the best sparse representation basis and introduce average cross-correlation to examine the rationality of the established model.Furthermore,an improved fast matching pursuit algorithm is proposed to solve the model.Simulation results show that the proposed method can effectively recover the missing WSNs data. 展开更多
关键词 average cross correlation matching pursuit missing data wireless sensor networks
下载PDF
Censored Composite Conditional Quantile Screening for High-Dimensional Survival Data
3
作者 LIU Wei LI Yingqiu 《应用概率统计》 CSCD 北大核心 2024年第5期783-799,共17页
In this paper,we introduce the censored composite conditional quantile coefficient(cC-CQC)to rank the relative importance of each predictor in high-dimensional censored regression.The cCCQC takes advantage of all usef... In this paper,we introduce the censored composite conditional quantile coefficient(cC-CQC)to rank the relative importance of each predictor in high-dimensional censored regression.The cCCQC takes advantage of all useful information across quantiles and can detect nonlinear effects including interactions and heterogeneity,effectively.Furthermore,the proposed screening method based on cCCQC is robust to the existence of outliers and enjoys the sure screening property.Simulation results demonstrate that the proposed method performs competitively on survival datasets of high-dimensional predictors,particularly when the variables are highly correlated. 展开更多
关键词 high-dimensional survival data censored composite conditional quantile coefficient sure screening property rank consistency property
下载PDF
Comparison of two statistical methods for handling missing values of quantitative data in Bayesian N-of-1 trials: a simulation study
4
作者 Jing-Bo Zhai Tian-Ci Guo Wei-Jie Yu 《Medical Data Mining》 2024年第1期10-15,共6页
Background:Missing data are frequently occurred in clinical studies.Due to the development of precision medicine,there is an increased interest in N-of-1 trial.Bayesian models are one of main statistical methods for a... Background:Missing data are frequently occurred in clinical studies.Due to the development of precision medicine,there is an increased interest in N-of-1 trial.Bayesian models are one of main statistical methods for analyzing the data of N-of-1 trials.This simulation study aimed to compare two statistical methods for handling missing values of quantitative data in Bayesian N-of-1 trials.Methods:The simulated data of N-of-1 trials with different coefficients of autocorrelation,effect sizes and missing ratios are obtained by SAS 9.1 system.The missing values are filled with mean filling and regression filling respectively in the condition of different coefficients of autocorrelation,effect sizes and missing ratios by SPSS 25.0 software.Bayesian models are built to estimate the posterior means by Winbugs 14 software.Results:When the missing ratio is relatively small,e.g.5%,missing values have relatively little effect on the results.Therapeutic effects may be underestimated when the coefficient of autocorrelation increases and no filling is used.However,it may be overestimated when mean or regression filling is used,and the results after mean filling are closer to the actual effect than regression filling.In the case of moderate missing ratio,the estimated effect after mean filling is closer to the actual effect compared to regression filling.When a large missing ratio(20%)occurs,data missing can lead to significantly underestimate the effect.In this case,the estimated effect after regression filling is closer to the actual effect compared to mean filling.Conclusion:Data missing can affect the estimated therapeutic effects using Bayesian models in N-of-1 trials.The present study suggests that mean filling can be used under situation of missing ratio≤10%.Otherwise,regression filling may be preferable. 展开更多
关键词 N-of-1 trial BAYESIAN missing data simulation study
下载PDF
Observation points classifier ensemble for high-dimensional imbalanced classification 被引量:1
5
作者 Yulin He Xu Li +3 位作者 Philippe Fournier‐Viger Joshua Zhexue Huang Mianjie Li Salman Salloum 《CAAI Transactions on Intelligence Technology》 SCIE EI 2023年第2期500-517,共18页
In this paper,an Observation Points Classifier Ensemble(OPCE)algorithm is proposed to deal with High-Dimensional Imbalanced Classification(HDIC)problems based on data processed using the Multi-Dimensional Scaling(MDS)... In this paper,an Observation Points Classifier Ensemble(OPCE)algorithm is proposed to deal with High-Dimensional Imbalanced Classification(HDIC)problems based on data processed using the Multi-Dimensional Scaling(MDS)feature extraction technique.First,dimensionality of the original imbalanced data is reduced using MDS so that distances between any two different samples are preserved as well as possible.Second,a novel OPCE algorithm is applied to classify imbalanced samples by placing optimised observation points in a low-dimensional data space.Third,optimization of the observation point mappings is carried out to obtain a reliable assessment of the unknown samples.Exhaustive experiments have been conducted to evaluate the feasibility,rationality,and effectiveness of the proposed OPCE algorithm using seven benchmark HDIC data sets.Experimental results show that(1)the OPCE algorithm can be trained faster on low-dimensional imbalanced data than on high-dimensional data;(2)the OPCE algorithm can correctly identify samples as the number of optimised observation points is increased;and(3)statistical analysis reveals that OPCE yields better HDIC performances on the selected data sets in comparison with eight other HDIC algorithms.This demonstrates that OPCE is a viable algorithm to deal with HDIC problems. 展开更多
关键词 classifier ensemble feature transformation high-dimensional data classification imbalanced learning observation point mechanism
下载PDF
K-Nearest Neighbor Based Missing Data Estimation Algorithm in Wireless Sensor Networks 被引量:7
6
作者 Liqiang Pan Jianzhong Li 《Wireless Sensor Network》 2010年第2期115-122,共8页
In wireless sensor networks, the missing of sensor data is inevitable due to the inherent characteristic of wireless sensor networks, and it causes many difficulties in various applications. To solve the problem, the ... In wireless sensor networks, the missing of sensor data is inevitable due to the inherent characteristic of wireless sensor networks, and it causes many difficulties in various applications. To solve the problem, the missing data should be estimated as accurately as possible. In this paper, a k-nearest neighbor based missing data estimation algorithm is proposed based on the temporal and spatial correlation of sensor data. It adopts the linear regression model to describe the spatial correlation of sensor data among different sensor nodes, and utilizes the data information of multiple neighbor nodes to estimate the missing data jointly rather than independently, so that a stable and reliable estimation performance can be achieved. Experimental results on two real-world datasets show that the proposed algorithm can estimate the missing data accurately. 展开更多
关键词 missing data ESTIMATION WIRELESS SENSOR NETWORKS
下载PDF
A Length-Adaptive Non-Dominated Sorting Genetic Algorithm for Bi-Objective High-Dimensional Feature Selection
7
作者 Yanlu Gong Junhai Zhou +2 位作者 Quanwang Wu MengChu Zhou Junhao Wen 《IEEE/CAA Journal of Automatica Sinica》 SCIE EI CSCD 2023年第9期1834-1844,共11页
As a crucial data preprocessing method in data mining,feature selection(FS)can be regarded as a bi-objective optimization problem that aims to maximize classification accuracy and minimize the number of selected featu... As a crucial data preprocessing method in data mining,feature selection(FS)can be regarded as a bi-objective optimization problem that aims to maximize classification accuracy and minimize the number of selected features.Evolutionary computing(EC)is promising for FS owing to its powerful search capability.However,in traditional EC-based methods,feature subsets are represented via a length-fixed individual encoding.It is ineffective for high-dimensional data,because it results in a huge search space and prohibitive training time.This work proposes a length-adaptive non-dominated sorting genetic algorithm(LA-NSGA)with a length-variable individual encoding and a length-adaptive evolution mechanism for bi-objective highdimensional FS.In LA-NSGA,an initialization method based on correlation and redundancy is devised to initialize individuals of diverse lengths,and a Pareto dominance-based length change operator is introduced to guide individuals to explore in promising search space adaptively.Moreover,a dominance-based local search method is employed for further improvement.The experimental results based on 12 high-dimensional gene datasets show that the Pareto front of feature subsets produced by LA-NSGA is superior to those of existing algorithms. 展开更多
关键词 Bi-objective optimization feature selection(FS) genetic algorithm high-dimensional data length-adaptive
下载PDF
Similarity measurement method of high-dimensional data based on normalized net lattice subspace 被引量:4
8
作者 李文法 Wang Gongming +1 位作者 Li Ke Huang Su 《High Technology Letters》 EI CAS 2017年第2期179-184,共6页
The performance of conventional similarity measurement methods is affected seriously by the curse of dimensionality of high-dimensional data.The reason is that data difference between sparse and noisy dimensionalities... The performance of conventional similarity measurement methods is affected seriously by the curse of dimensionality of high-dimensional data.The reason is that data difference between sparse and noisy dimensionalities occupies a large proportion of the similarity,leading to the dissimilarities between any results.A similarity measurement method of high-dimensional data based on normalized net lattice subspace is proposed.The data range of each dimension is divided into several intervals,and the components in different dimensions are mapped onto the corresponding interval.Only the component in the same or adjacent interval is used to calculate the similarity.To validate this method,three data types are used,and seven common similarity measurement methods are compared.The experimental result indicates that the relative difference of the method is increasing with the dimensionality and is approximately two or three orders of magnitude higher than the conventional method.In addition,the similarity range of this method in different dimensions is [0,1],which is fit for similarity analysis after dimensionality reduction. 展开更多
关键词 high-dimensional data the curse of dimensionality SIMILARITY NORMALIZATION SUBSPACE NPsim
下载PDF
Generalized unscented Kalman filtering based radial basis function neural network for the prediction of ground radioactivity time series with missing data 被引量:2
9
作者 伍雪冬 王耀南 +1 位作者 刘维亭 朱志宇 《Chinese Physics B》 SCIE EI CAS CSCD 2011年第6期546-551,共6页
On the assumption that random interruptions in the observation process are modeled by a sequence of independent Bernoulli random variables, we firstly generalize two kinds of nonlinear filtering methods with random in... On the assumption that random interruptions in the observation process are modeled by a sequence of independent Bernoulli random variables, we firstly generalize two kinds of nonlinear filtering methods with random interruption failures in the observation based on the extended Kalman filtering (EKF) and the unscented Kalman filtering (UKF), which were shortened as GEKF and CUKF in this paper, respectively. Then the nonlinear filtering model is established by using the radial basis function neural network (RBFNN) prototypes and the network weights as state equation and the output of RBFNN to present the observation equation. Finally, we take the filtering problem under missing observed data as a special case of nonlinear filtering with random intermittent failures by setting each missing data to be zero without needing to pre-estimate the missing data, and use the GEKF-based RBFNN and the GUKF-based RBFNN to predict the ground radioactivity time series with missing data. Experimental results demonstrate that the prediction results of GUKF-based RBFNN accord well with the real ground radioactivity time series while the prediction results of GEKF-based RBFNN are divergent. 展开更多
关键词 prediction of time series with missing data random interruption failures in the observation neural network approximation
下载PDF
Ultra-High Dimensional Feature Selection and Mean Estimation under Missing at Random
10
作者 Wanhui Li Guangming Deng Dong Pan 《Open Journal of Statistics》 2023年第6期850-871,共22页
Next Generation Sequencing (NGS) provides an effective basis for estimating the survival time of cancer patients, but it also poses the problem of high data dimensionality, in addition to the fact that some patients d... Next Generation Sequencing (NGS) provides an effective basis for estimating the survival time of cancer patients, but it also poses the problem of high data dimensionality, in addition to the fact that some patients drop out of the study, making the data missing, so a method for estimating the mean of the response variable with missing values for the ultra-high dimensional datasets is needed. In this paper, we propose a two-stage ultra-high dimensional variable screening method, RF-SIS, based on random forest regression, which effectively solves the problem of estimating missing values due to excessive data dimension. After the dimension reduction process by applying RF-SIS, mean interpolation is executed on the missing responses. The results of the simulated data show that compared with the estimation method of directly deleting missing observations, the estimation results of RF-SIS-MI have significant advantages in terms of the proportion of intervals covered, the average length of intervals, and the average absolute deviation. 展开更多
关键词 Ultrahigh-dimensional data missing data Sure Independent Screening Mean Estimation
下载PDF
Missing Data Imputations for Upper Air Temperature at 24 Standard Pressure Levels over Pakistan Collected from Aqua Satellite 被引量:4
11
作者 Muhammad Usman Saleem Sajid Rashid Ahmed 《Journal of Data Analysis and Information Processing》 2016年第3期132-146,共16页
This research was an effort to select best imputation method for missing upper air temperature data over 24 standard pressure levels. We have implemented four imputation techniques like inverse distance weighting, Bil... This research was an effort to select best imputation method for missing upper air temperature data over 24 standard pressure levels. We have implemented four imputation techniques like inverse distance weighting, Bilinear, Natural and Nearest interpolation for missing data imputations. Performance indicators for these techniques were the root mean square error (RMSE), absolute mean error (AME), correlation coefficient and coefficient of determination ( R<sup>2</sup> ) adopted in this research. We randomly make 30% of total samples (total samples was 324) predictable from 70% remaining data. Although four interpolation methods seem good (producing <1 RMSE, AME) for imputations of air temperature data, but bilinear method was the most accurate with least errors for missing data imputations. RMSE for bilinear method remains <0.01 on all pressure levels except 1000 hPa where this value was 0.6. The low value of AME (<0.1) came at all pressure levels through bilinear imputations. Very strong correlation (>0.99) found between actual and predicted air temperature data through this method. The high value of the coefficient of determination (0.99) through bilinear interpolation method, tells us best fit to the surface. We have also found similar results for imputation with natural interpolation method in this research, but after investigating scatter plots over each month, imputations with this method seem to little obtuse in certain months than bilinear method. 展开更多
关键词 missing data Imputations Spatial Interpolation AQUA Satellite Upper Level Air Temperature AIRX3STML
下载PDF
Comparison of Missing Data Imputation Methods in Time Series Forecasting 被引量:1
12
作者 Hyun Ahn Kyunghee Sun Kwanghoon Pio Kim 《Computers, Materials & Continua》 SCIE EI 2022年第1期767-779,共13页
Time series forecasting has become an important aspect of data analysis and has many real-world applications.However,undesirable missing values are often encountered,which may adversely affect many forecasting tasks.I... Time series forecasting has become an important aspect of data analysis and has many real-world applications.However,undesirable missing values are often encountered,which may adversely affect many forecasting tasks.In this study,we evaluate and compare the effects of imputationmethods for estimating missing values in a time series.Our approach does not include a simulation to generate pseudo-missing data,but instead perform imputation on actual missing data and measure the performance of the forecasting model created therefrom.In an experiment,therefore,several time series forecasting models are trained using different training datasets prepared using each imputation method.Subsequently,the performance of the imputation methods is evaluated by comparing the accuracy of the forecasting models.The results obtained from a total of four experimental cases show that the k-nearest neighbor technique is the most effective in reconstructing missing data and contributes positively to time series forecasting compared with other imputation methods. 展开更多
关键词 missing data imputation method time series forecasting LSTM
下载PDF
Missing data and the accuracy of Bayesian phylogenetics 被引量:2
13
作者 John J. WIENS Daniel S. MOEN 《植物分类学报》 CSCD 北大核心 2008年第3期307-314,共8页
The effect of missing data on phylogenetic methods is a potentially important issue in our attempts to reconstruct the Tree of Life. If missing data are truly problematic, then it may be unwise to include species in a... The effect of missing data on phylogenetic methods is a potentially important issue in our attempts to reconstruct the Tree of Life. If missing data are truly problematic, then it may be unwise to include species in an analysis that lack data for some characters (incomplete taxa) or to include characters that lack data for some species. Given the difficulty of obtaining data from all characters for all taxa (e.g., fossils), missing data might seriously impede efforts to reconstruct a comprehensive phylogeny that includes all species. Fortunately, recent simulations and empirical analyses suggest that missing data cells are not themselves problematic, and that in-complete taxa can be accurately placed as long as the overall number of characters in the analysis is large. How-ever, these studies have so far only been conducted on parsimony, likelihood, and neighbor-joining methods. Although Bayesian phylogenetic methods have become widely used in recent years, the effects of missing data on Bayesian analysis have not been adequately studied. Here, we conduct simulations to test whether Bayesian analyses can accurately place incomplete taxa despite extensive missing data. In agreement with previous studies of other methods, we find that Bayesian analyses can accurately reconstruct the position of highly incomplete taxa (i.e., 95% missing data), as long as the overall number of characters in the analysis is large. These results suggest that highly incomplete taxa can be safely included in many Bayesian phylogenetic analyses. 展开更多
关键词 贝叶斯定理 系统发生学 数据缺失 准确度
下载PDF
RAD-seq data reveals robust phylogeny and morphological evolutionary history of Rhododendron
14
作者 Yuanting Shen Gang Yao +6 位作者 Yunfei Li Xiaoling Tian Shiming Li Nian Wang Chengjun Zhang Fei Wang Yongpeng Ma 《Horticultural Plant Journal》 SCIE CAS CSCD 2024年第3期866-878,共13页
Rhododendron is famous for its high ornamental value.However,the genus is taxonomically difficult and the relationships within Rhododendron remain unresolved.In addition,the origin of key morphological characters with... Rhododendron is famous for its high ornamental value.However,the genus is taxonomically difficult and the relationships within Rhododendron remain unresolved.In addition,the origin of key morphological characters with high horticulture value need to be explored.Both problems largely hinder utilization of germplasm resources.Most studies attempted to disentangle the phylogeny of Rhododendron,but only used a few genomic markers and lacked large-scale sampling,resulting in low clade support and contradictory phylogenetic signals.Here,we used restriction-site associated DNA sequencing(RAD-seq)data and morphological traits for 144 species of Rhododendron,representing all subgenera and most sections and subsections of this species-rich genus,to decipher its intricate evolutionary history and reconstruct ancestral state.Our results revealed high resolutions at subgenera and section levels of Rhododendron based on RAD-seq data.Both optimal phylogenetic tree and split tree recovered five lineages among Rhododendron.Subg.Therorhodion(cladeⅠ)formed the basal lineage.Subg.Tsutsusi and Azaleastrum formed cladeⅡand had sister relationships.CladeⅢincluded all scaly rhododendron species.Subg.Pentanthera(cladeⅣ)formed a sister group to Subg.Hymenanthes(cladeⅤ).The results of ancestral state reconstruction showed that Rhododendron ancestor was a deciduous woody plant with terminal inflorescence,ten stamens,leaf blade without scales and broadly funnelform corolla with pink or purple color.This study shows significant distinguishability to resolve the evolutionary history of Rhododendron based on high clade support of phylogenetic tree constructed by RAD-seq data.It also provides an example to resolve discordant signals in phylogenetic trees and demonstrates the application feasibility of RAD-seq with large amounts of missing data in deciphering intricate evolutionary relationships.Additionally,the reconstructed ancestral state of six important characters provides insights into the innovation of key characters in Rhododendron. 展开更多
关键词 RHODODENDRON RAD-seq missing data Quartet sampling(QS) Ancestral state reconstruction
下载PDF
Data-driven Surrogate-assisted Method for High-dimensional Multi-area Combined Economic/Emission Dispatch
15
作者 Chenhao Lin Huijun Liang +2 位作者 Aokang Pang Jianwei Zhong Yongchao Yang 《Journal of Modern Power Systems and Clean Energy》 SCIE EI CSCD 2024年第1期52-64,共13页
Multi-area combined economic/emission dispatch(MACEED)problems are generally studied using analytical functions.However,as the scale of power systems increases,ex isting solutions become time-consuming and may not mee... Multi-area combined economic/emission dispatch(MACEED)problems are generally studied using analytical functions.However,as the scale of power systems increases,ex isting solutions become time-consuming and may not meet oper ational constraints.To overcome excessive computational ex pense in high-dimensional MACEED problems,a novel data-driven surrogate-assisted method is proposed.First,a cosine-similarity-based deep belief network combined with a back-propagation(DBN+BP)neural network is utilized to replace cost and emission functions.Second,transfer learning is applied with a pretraining and fine-tuning method to improve DBN+BP regression surrogate models,thus realizing fast con struction of surrogate models between different regional power systems.Third,a multi-objective antlion optimizer with a novel general single-dimension retention bi-objective optimization poli cy is proposed to execute MACEED optimization to obtain scheduling decisions.The proposed method not only ensures the convergence,uniformity,and extensibility of the Pareto front,but also greatly reduces the computational time.Finally,a 4-ar ea 40-unit test system with different constraints is employed to demonstrate the effectiveness of the proposed method. 展开更多
关键词 Multi-area combined economic/emission dispatch high-dimensional power system deep belief network data driven transfer learning
原文传递
Tensor Completion for Recovering Multichannel Audio Signal with Missing Data
16
作者 Lidong Yang Min Liu +2 位作者 Jing Wang Xiang Xie Jingming Kuang 《China Communications》 SCIE CSCD 2019年第4期186-195,共10页
The quality of a multichannel audio signal may be reduced by missing data, which must be recovered before use. The data sets of multichannel audio can be quite large and have more than two axes of variation, such as c... The quality of a multichannel audio signal may be reduced by missing data, which must be recovered before use. The data sets of multichannel audio can be quite large and have more than two axes of variation, such as channel, frame, and feature. To recover missing audio data, we propose a low-rank tensor completion method that is a high-order generalization of matrix completion. First, a multichannel audio signal with missing data is modeled by a three-order tensor. Next, tensor completion is formulated as a convex optimization problem by defining the trace norm of the tensor, and then an augmented Lagrange multiplier method is used for solving the constrained optimization problem. Finally, the missing data is replaced by alternating iteration with a tensor computation. Experiments were conducted to evaluate the effectiveness on data of a 5.1-channel audio signal. The results show that the proposed method outperforms state-of-the-art methods. Moreover, subjective listening tests with MUSHRA(Multiple Stimuli with Hidden Reference and Anchor) indicate that better audio effects were obtained by tensor completion. 展开更多
关键词 TENSOR COMPLETION missing data MULTICHANNEL AUDIO CONVEX optimization
下载PDF
Improved interpolation method based on singular spectrum analysis iteration and its application to missing data recovery
17
作者 王辉赞 张韧 +2 位作者 刘巍 王桂华 金宝刚 《Applied Mathematics and Mechanics(English Edition)》 SCIE EI 2008年第10期1351-1361,共11页
A novel interval quartering algorithm (IQA) is proposed to overcome insufficiency of the conventional singular spectrum analysis (SSA) iterative interpolation for selecting parameters including the number of the p... A novel interval quartering algorithm (IQA) is proposed to overcome insufficiency of the conventional singular spectrum analysis (SSA) iterative interpolation for selecting parameters including the number of the principal components and the embedding dimension. Based on the improved SSA iterative interpolation, interpolated test and comparative analysis are carried out to the outgoing longwave radiation daily data. The results show that IQA can find globally optimal parameters to the error curve with local oscillation, and has advantage of fast computing speed. The improved interpolation method is effective in the interpolation of missing data. 展开更多
关键词 singular spectrum analysis outgoing longwave radiation interpolation of missing data interval quartering algorithm
下载PDF
Does Missing Data in Studies of Hard-to-Reach Populations Bias Results? Not Necessarily
18
作者 Anneliese C. Bolland Sara Tomek John M. Bolland 《Open Journal of Statistics》 2017年第2期264-289,共26页
Missing data are always an issue in community-based longitudinal studies, calling into question the representativeness of samples and bias in conclusions, the research has generated. This may be due to the difficulty ... Missing data are always an issue in community-based longitudinal studies, calling into question the representativeness of samples and bias in conclusions, the research has generated. This may be due to the difficulty of implementing random sampling procedures in these studies and/or the inherent difficulty in sampling hard-to-reach segments of the population being studied. In fact, the ability to accurately study hard-to-reach populations in light of potential bias created by missing data remains an open question. In this study, missing data are defined as both failure to interview potential research participants identified in the sampling frame and failure to retain enrolled research participants longitudinally. Using the sample from the Mobile Youth Survey, a multiple-cohort, longitudinal study of adolescents living in highly impoverished neighborhoods in Mobile, Alabama, we examined sample representativeness and dropout to determine whether missing data led to a nonrepresentative, and therefore, biased sample. Results indicate that even though random procedures are not strictly used to draw the sample, (a) the sample appears to be largely representative of the population that was studied, and (b) attrition is largely uncorrelated with characteristics of those who dropped out. This suggests that it is possible to study with validity hard-to reach populations in community settings. 展开更多
关键词 Hard-to-Reach POPULATIONS missing data REPRESENTATIVENESS COMMUNITY-BASED Research
下载PDF
Using Statistical Learning to Treat Missing Data: A Case of HIV/TB Co-Infection in Kenya
19
作者 Joshua O. Mwaro Linda Chaba Collins Odhiambo 《Journal of Data Analysis and Information Processing》 2020年第3期110-133,共24页
In this study, we investigate the effects of missing data when estimating HIV/TB co-infection. We revisit the concept of missing data and examine three available approaches for dealing with missingness. The main objec... In this study, we investigate the effects of missing data when estimating HIV/TB co-infection. We revisit the concept of missing data and examine three available approaches for dealing with missingness. The main objective is to identify the best method for correcting missing data in TB/HIV Co-infection setting. We employ both empirical data analysis and extensive simulation study to examine the effects of missing data, the accuracy, sensitivity, specificity and train and test error for different approaches. The novelty of this work hinges on the use of modern statistical learning algorithm when treating missingness. In the empirical analysis, both HIV data and TB-HIV co-infection data imputations were performed, and the missing values were imputed using different approaches. In the simulation study, sets of 0% (Complete case), 10%, 30%, 50% and 80% of the data were drawn randomly and replaced with missing values. Results show complete cases only had a co-infection rate (95% Confidence Interval band) of 29% (25%, 33%), weighted method 27% (23%, 31%), likelihood-based approach 26% (24%, 28%) and multiple imputation approach 21% (20%, 22%). In conclusion, MI remains the best approach for dealing with missing data and failure to apply it, results to overestimation of HIV/TB co-infection rate by 8%. 展开更多
关键词 missing data HIV/TB Co-Infection IMPUTATION missing at Random Count data
下载PDF
On the Use of Local Assessments for Monitoring Centrally Reviewed Endpoints with Missing Data in Clinical Trials
20
作者 Sean S. Brummel Daniel L. Gillen 《Open Journal of Statistics》 2013年第4期41-54,共14页
Due to ethical and logistical concerns it is common for data monitoring committees to periodically monitor accruing clinical trial data to assess the safety, and possibly efficacy, of a new experimental treatment. Whe... Due to ethical and logistical concerns it is common for data monitoring committees to periodically monitor accruing clinical trial data to assess the safety, and possibly efficacy, of a new experimental treatment. When formalized, monitoring is typically implemented using group sequential methods. In some cases regulatory agencies have required that primary trial analyses should be based solely on the judgment of an independent review committee (IRC). The IRC assessments can produce difficulties for trial monitoring given the time lag typically associated with receiving assessments from the IRC. This results in a missing data problem wherein a surrogate measure of response may provide useful information for interim decisions and future monitoring strategies. In this paper, we present statistical tools that are helpful for monitoring a group sequential clinical trial with missing IRC data. We illustrate the proposed methodology in the case of binary endpoints under various missingness mechanisms including missing completely at random assessments and when missingness depends on the IRC’s measurement. 展开更多
关键词 Group SEQUENTIAL Information INDEPENDENT Review ENDPOINT missing data
下载PDF
上一页 1 2 72 下一页 到第
使用帮助 返回顶部