期刊文献+
共找到46篇文章
< 1 2 3 >
每页显示 20 50 100
A Practical Approach for Missing Wireless Sensor Networks Data Recovery
1
作者 Song Xiaoxiang Guo Yan +1 位作者 Li Ning Ren Bing 《China Communications》 SCIE CSCD 2024年第5期202-217,共16页
In wireless sensor networks(WSNs),the performance of related applications is highly dependent on the quality of data collected.Unfortunately,missing data is almost inevitable in the process of data acquisition and tra... In wireless sensor networks(WSNs),the performance of related applications is highly dependent on the quality of data collected.Unfortunately,missing data is almost inevitable in the process of data acquisition and transmission.Existing methods often rely on prior information such as low-rank characteristics or spatiotemporal correlation when recovering missing WSNs data.However,in realistic application scenarios,it is very difficult to obtain these prior information from incomplete data sets.Therefore,we aim to recover the missing WSNs data effectively while getting rid of the perplexity of prior information.By designing the corresponding measurement matrix that can capture the position of missing data and sparse representation matrix,a compressive sensing(CS)based missing data recovery model is established.Then,we design a comparison standard to select the best sparse representation basis and introduce average cross-correlation to examine the rationality of the established model.Furthermore,an improved fast matching pursuit algorithm is proposed to solve the model.Simulation results show that the proposed method can effectively recover the missing WSNs data. 展开更多
关键词 average cross correlation matching pursuit missing data wireless sensor networks
下载PDF
Optimal Estimation of High-Dimensional Covariance Matrices with Missing and Noisy Data
2
作者 Meiyin Wang Wanzhou Ye 《Advances in Pure Mathematics》 2024年第4期214-227,共14页
The estimation of covariance matrices is very important in many fields, such as statistics. In real applications, data are frequently influenced by high dimensions and noise. However, most relevant studies are based o... The estimation of covariance matrices is very important in many fields, such as statistics. In real applications, data are frequently influenced by high dimensions and noise. However, most relevant studies are based on complete data. This paper studies the optimal estimation of high-dimensional covariance matrices based on missing and noisy sample under the norm. First, the model with sub-Gaussian additive noise is presented. The generalized sample covariance is then modified to define a hard thresholding estimator , and the minimax upper bound is derived. After that, the minimax lower bound is derived, and it is concluded that the estimator presented in this article is rate-optimal. Finally, numerical simulation analysis is performed. The result shows that for missing samples with sub-Gaussian noise, if the true covariance matrix is sparse, the hard thresholding estimator outperforms the traditional estimate method. 展开更多
关键词 High-Dimensional Covariance Matrix missing Data Sub-Gaussian Noise Optimal Estimation
下载PDF
Comparison of two statistical methods for handling missing values of quantitative data in Bayesian N-of-1 trials: a simulation study
3
作者 Jing-Bo Zhai Tian-Ci Guo Wei-Jie Yu 《Medical Data Mining》 2024年第1期10-15,共6页
Background:Missing data are frequently occurred in clinical studies.Due to the development of precision medicine,there is an increased interest in N-of-1 trial.Bayesian models are one of main statistical methods for a... Background:Missing data are frequently occurred in clinical studies.Due to the development of precision medicine,there is an increased interest in N-of-1 trial.Bayesian models are one of main statistical methods for analyzing the data of N-of-1 trials.This simulation study aimed to compare two statistical methods for handling missing values of quantitative data in Bayesian N-of-1 trials.Methods:The simulated data of N-of-1 trials with different coefficients of autocorrelation,effect sizes and missing ratios are obtained by SAS 9.1 system.The missing values are filled with mean filling and regression filling respectively in the condition of different coefficients of autocorrelation,effect sizes and missing ratios by SPSS 25.0 software.Bayesian models are built to estimate the posterior means by Winbugs 14 software.Results:When the missing ratio is relatively small,e.g.5%,missing values have relatively little effect on the results.Therapeutic effects may be underestimated when the coefficient of autocorrelation increases and no filling is used.However,it may be overestimated when mean or regression filling is used,and the results after mean filling are closer to the actual effect than regression filling.In the case of moderate missing ratio,the estimated effect after mean filling is closer to the actual effect compared to regression filling.When a large missing ratio(20%)occurs,data missing can lead to significantly underestimate the effect.In this case,the estimated effect after regression filling is closer to the actual effect compared to mean filling.Conclusion:Data missing can affect the estimated therapeutic effects using Bayesian models in N-of-1 trials.The present study suggests that mean filling can be used under situation of missing ratio≤10%.Otherwise,regression filling may be preferable. 展开更多
关键词 N-of-1 trial BAYESIAN missing data simulation study
下载PDF
Ultra-High Dimensional Feature Selection and Mean Estimation under Missing at Random
4
作者 Wanhui Li Guangming Deng Dong Pan 《Open Journal of Statistics》 2023年第6期850-871,共22页
Next Generation Sequencing (NGS) provides an effective basis for estimating the survival time of cancer patients, but it also poses the problem of high data dimensionality, in addition to the fact that some patients d... Next Generation Sequencing (NGS) provides an effective basis for estimating the survival time of cancer patients, but it also poses the problem of high data dimensionality, in addition to the fact that some patients drop out of the study, making the data missing, so a method for estimating the mean of the response variable with missing values for the ultra-high dimensional datasets is needed. In this paper, we propose a two-stage ultra-high dimensional variable screening method, RF-SIS, based on random forest regression, which effectively solves the problem of estimating missing values due to excessive data dimension. After the dimension reduction process by applying RF-SIS, mean interpolation is executed on the missing responses. The results of the simulated data show that compared with the estimation method of directly deleting missing observations, the estimation results of RF-SIS-MI have significant advantages in terms of the proportion of intervals covered, the average length of intervals, and the average absolute deviation. 展开更多
关键词 Ultrahigh-Dimensional Data missing Data Sure Independent Screening Mean Estimation
下载PDF
RAD-seq data reveals robust phylogeny and morphological evolutionary history of Rhododendron
5
作者 Yuanting Shen Gang Yao +6 位作者 Yunfei Li Xiaoling Tian Shiming Li Nian Wang Chengjun Zhang Fei Wang Yongpeng Ma 《Horticultural Plant Journal》 SCIE CAS CSCD 2024年第3期866-878,共13页
Rhododendron is famous for its high ornamental value.However,the genus is taxonomically difficult and the relationships within Rhododendron remain unresolved.In addition,the origin of key morphological characters with... Rhododendron is famous for its high ornamental value.However,the genus is taxonomically difficult and the relationships within Rhododendron remain unresolved.In addition,the origin of key morphological characters with high horticulture value need to be explored.Both problems largely hinder utilization of germplasm resources.Most studies attempted to disentangle the phylogeny of Rhododendron,but only used a few genomic markers and lacked large-scale sampling,resulting in low clade support and contradictory phylogenetic signals.Here,we used restriction-site associated DNA sequencing(RAD-seq)data and morphological traits for 144 species of Rhododendron,representing all subgenera and most sections and subsections of this species-rich genus,to decipher its intricate evolutionary history and reconstruct ancestral state.Our results revealed high resolutions at subgenera and section levels of Rhododendron based on RAD-seq data.Both optimal phylogenetic tree and split tree recovered five lineages among Rhododendron.Subg.Therorhodion(cladeⅠ)formed the basal lineage.Subg.Tsutsusi and Azaleastrum formed cladeⅡand had sister relationships.CladeⅢincluded all scaly rhododendron species.Subg.Pentanthera(cladeⅣ)formed a sister group to Subg.Hymenanthes(cladeⅤ).The results of ancestral state reconstruction showed that Rhododendron ancestor was a deciduous woody plant with terminal inflorescence,ten stamens,leaf blade without scales and broadly funnelform corolla with pink or purple color.This study shows significant distinguishability to resolve the evolutionary history of Rhododendron based on high clade support of phylogenetic tree constructed by RAD-seq data.It also provides an example to resolve discordant signals in phylogenetic trees and demonstrates the application feasibility of RAD-seq with large amounts of missing data in deciphering intricate evolutionary relationships.Additionally,the reconstructed ancestral state of six important characters provides insights into the innovation of key characters in Rhododendron. 展开更多
关键词 RHODODENDRON RAD-seq missing data Quartet sampling(QS) Ancestral state reconstruction
下载PDF
Generalized unscented Kalman filtering based radial basis function neural network for the prediction of ground radioactivity time series with missing data 被引量:2
6
作者 伍雪冬 王耀南 +1 位作者 刘维亭 朱志宇 《Chinese Physics B》 SCIE EI CAS CSCD 2011年第6期546-551,共6页
On the assumption that random interruptions in the observation process are modeled by a sequence of independent Bernoulli random variables, we firstly generalize two kinds of nonlinear filtering methods with random in... On the assumption that random interruptions in the observation process are modeled by a sequence of independent Bernoulli random variables, we firstly generalize two kinds of nonlinear filtering methods with random interruption failures in the observation based on the extended Kalman filtering (EKF) and the unscented Kalman filtering (UKF), which were shortened as GEKF and CUKF in this paper, respectively. Then the nonlinear filtering model is established by using the radial basis function neural network (RBFNN) prototypes and the network weights as state equation and the output of RBFNN to present the observation equation. Finally, we take the filtering problem under missing observed data as a special case of nonlinear filtering with random intermittent failures by setting each missing data to be zero without needing to pre-estimate the missing data, and use the GEKF-based RBFNN and the GUKF-based RBFNN to predict the ground radioactivity time series with missing data. Experimental results demonstrate that the prediction results of GUKF-based RBFNN accord well with the real ground radioactivity time series while the prediction results of GEKF-based RBFNN are divergent. 展开更多
关键词 prediction of time series with missing data random interruption failures in the observation neural network approximation
下载PDF
Comparison of Missing Data Imputation Methods in Time Series Forecasting 被引量:1
7
作者 Hyun Ahn Kyunghee Sun Kwanghoon Pio Kim 《Computers, Materials & Continua》 SCIE EI 2022年第1期767-779,共13页
Time series forecasting has become an important aspect of data analysis and has many real-world applications.However,undesirable missing values are often encountered,which may adversely affect many forecasting tasks.I... Time series forecasting has become an important aspect of data analysis and has many real-world applications.However,undesirable missing values are often encountered,which may adversely affect many forecasting tasks.In this study,we evaluate and compare the effects of imputationmethods for estimating missing values in a time series.Our approach does not include a simulation to generate pseudo-missing data,but instead perform imputation on actual missing data and measure the performance of the forecasting model created therefrom.In an experiment,therefore,several time series forecasting models are trained using different training datasets prepared using each imputation method.Subsequently,the performance of the imputation methods is evaluated by comparing the accuracy of the forecasting models.The results obtained from a total of four experimental cases show that the k-nearest neighbor technique is the most effective in reconstructing missing data and contributes positively to time series forecasting compared with other imputation methods. 展开更多
关键词 missing data imputation method time series forecasting LSTM
下载PDF
Missing Data Imputations for Upper Air Temperature at 24 Standard Pressure Levels over Pakistan Collected from Aqua Satellite 被引量:4
8
作者 Muhammad Usman Saleem Sajid Rashid Ahmed 《Journal of Data Analysis and Information Processing》 2016年第3期132-146,共16页
This research was an effort to select best imputation method for missing upper air temperature data over 24 standard pressure levels. We have implemented four imputation techniques like inverse distance weighting, Bil... This research was an effort to select best imputation method for missing upper air temperature data over 24 standard pressure levels. We have implemented four imputation techniques like inverse distance weighting, Bilinear, Natural and Nearest interpolation for missing data imputations. Performance indicators for these techniques were the root mean square error (RMSE), absolute mean error (AME), correlation coefficient and coefficient of determination ( R<sup>2</sup> ) adopted in this research. We randomly make 30% of total samples (total samples was 324) predictable from 70% remaining data. Although four interpolation methods seem good (producing <1 RMSE, AME) for imputations of air temperature data, but bilinear method was the most accurate with least errors for missing data imputations. RMSE for bilinear method remains <0.01 on all pressure levels except 1000 hPa where this value was 0.6. The low value of AME (<0.1) came at all pressure levels through bilinear imputations. Very strong correlation (>0.99) found between actual and predicted air temperature data through this method. The high value of the coefficient of determination (0.99) through bilinear interpolation method, tells us best fit to the surface. We have also found similar results for imputation with natural interpolation method in this research, but after investigating scatter plots over each month, imputations with this method seem to little obtuse in certain months than bilinear method. 展开更多
关键词 missing Data Imputations Spatial Interpolation AQUA Satellite Upper Level Air Temperature AIRX3STML
下载PDF
Estimation and test of restricted linear EV model with nonignorable missing covariates
9
作者 TANG Lin-jun ZHENG Sheng-chao ZHOU Zhan-gong 《Applied Mathematics(A Journal of Chinese Universities)》 SCIE CSCD 2018年第3期344-358,共15页
This paper deals with estimation and test procedures for restricted linear errors-invariables(EV) models with nonignorable missing covariates. We develop a restricted weighted corrected least squares(WCLS) estimator b... This paper deals with estimation and test procedures for restricted linear errors-invariables(EV) models with nonignorable missing covariates. We develop a restricted weighted corrected least squares(WCLS) estimator based on the propensity score, which is fitted by an exponentially tilted likelihood method. The limiting distributions of the proposed estimators are discussed when tilted parameter is known or unknown. To test the validity of the constraints,we construct two test procedures based on corrected residual sum of squares and empirical likelihood method and derive their asymptotic properties. Numerical studies are conducted to examine the finite sample performance of our proposed methods. 展开更多
关键词 errors-in-variables model nonignorable missing data propensity score smoothed empirical likelihood
下载PDF
Improved interpolation method based on singular spectrum analysis iteration and its application to missing data recovery
10
作者 王辉赞 张韧 +2 位作者 刘巍 王桂华 金宝刚 《Applied Mathematics and Mechanics(English Edition)》 SCIE EI 2008年第10期1351-1361,共11页
A novel interval quartering algorithm (IQA) is proposed to overcome insufficiency of the conventional singular spectrum analysis (SSA) iterative interpolation for selecting parameters including the number of the p... A novel interval quartering algorithm (IQA) is proposed to overcome insufficiency of the conventional singular spectrum analysis (SSA) iterative interpolation for selecting parameters including the number of the principal components and the embedding dimension. Based on the improved SSA iterative interpolation, interpolated test and comparative analysis are carried out to the outgoing longwave radiation daily data. The results show that IQA can find globally optimal parameters to the error curve with local oscillation, and has advantage of fast computing speed. The improved interpolation method is effective in the interpolation of missing data. 展开更多
关键词 singular spectrum analysis outgoing longwave radiation interpolation of missing data interval quartering algorithm
下载PDF
Evaluating Methods for Dealing with Missing Outcomes in Discrete-Time Event History Analysis: A Simulation Study
11
作者 Shahab Jolani Nils L. M. van de Ven +1 位作者 Maryam Safarkhani Mirjam Moerbeek 《Open Journal of Statistics》 2021年第1期36-76,共41页
<strong>Background:</strong><span style="font-family:;" "=""><span style="font-family:Verdana;"> In discrete-time event history analysis, subjects are measure... <strong>Background:</strong><span style="font-family:;" "=""><span style="font-family:Verdana;"> In discrete-time event history analysis, subjects are measured once each time period until they experience the event, prematurely drop out, or when the study concludes. This implies measuring event status of a subject in each time period determines whether (s)he should be measured in subsequent time periods. For that reason, intermittent missing event status causes a problem because, unlike other repeated measurement designs, it does not make sense to simply ignore the corresponding missing event status from the analysis (as long as the dropout is ignorable). </span><b><span style="font-family:Verdana;">Method:</span></b><span style="font-family:Verdana;"> We used Monte Carlo simulation to evaluate and compare various alternatives, including event occurrence recall, event (non-)occurrence, case deletion, period deletion, and single and multiple imputation methods, to deal with missing event status. Moreover, we showed the methods’ performance in the analysis of an empirical example on relapse to drug use. </span><b><span style="font-family:Verdana;">Result:</span></b><span style="font-family:Verdana;"> The strategies assuming event (non-)occurrence and the recall strategy had the worst performance because of a substantial parameter bias and a sharp decrease in coverage rate. Deletion methods suffered from either loss of power or undercoverage</span><span style="color:red;"> </span><span style="font-family:Verdana;">issues resulting from a biased standard error. Single imputation recovered the bias issue but showed an undercoverage estimate. Multiple imputations performed reasonabl</span></span><span style="font-family:Verdana;">y</span><span style="font-family:;" "=""><span style="font-family:Verdana;"> with a negligible standard error bias leading to a gradual decrease in power. </span><b><span style="font-family:Verdana;">Conclusion:</span></b><span style="font-family:Verdana;"> On the basis of the simulation results and real example, we provide practical guidance to researches in terms of the best ways to deal with missing event history data</span></span><span style="font-family:Verdana;">.</span> 展开更多
关键词 missing Data DELETION IMPUTATION Retrospective Observations Survival Analysis
下载PDF
Using Statistical Learning to Treat Missing Data: A Case of HIV/TB Co-Infection in Kenya
12
作者 Joshua O. Mwaro Linda Chaba Collins Odhiambo 《Journal of Data Analysis and Information Processing》 2020年第3期110-133,共24页
In this study, we investigate the effects of missing data when estimating HIV/TB co-infection. We revisit the concept of missing data and examine three available approaches for dealing with missingness. The main objec... In this study, we investigate the effects of missing data when estimating HIV/TB co-infection. We revisit the concept of missing data and examine three available approaches for dealing with missingness. The main objective is to identify the best method for correcting missing data in TB/HIV Co-infection setting. We employ both empirical data analysis and extensive simulation study to examine the effects of missing data, the accuracy, sensitivity, specificity and train and test error for different approaches. The novelty of this work hinges on the use of modern statistical learning algorithm when treating missingness. In the empirical analysis, both HIV data and TB-HIV co-infection data imputations were performed, and the missing values were imputed using different approaches. In the simulation study, sets of 0% (Complete case), 10%, 30%, 50% and 80% of the data were drawn randomly and replaced with missing values. Results show complete cases only had a co-infection rate (95% Confidence Interval band) of 29% (25%, 33%), weighted method 27% (23%, 31%), likelihood-based approach 26% (24%, 28%) and multiple imputation approach 21% (20%, 22%). In conclusion, MI remains the best approach for dealing with missing data and failure to apply it, results to overestimation of HIV/TB co-infection rate by 8%. 展开更多
关键词 missing Data HIV/TB Co-Infection IMPUTATION missing at Random Count Data
下载PDF
Fraction of Missing Information (γ) at Different Missing Data Fractions in the 2012 NAMCS Physician Workflow Mail Survey
13
作者 Qiyuan Pan Rong Wei 《Applied Mathematics》 2016年第10期1057-1067,共11页
In his 1987 classic book on multiple imputation (MI), Rubin used the fraction of missing information, γ, to define the relative efficiency (RE) of MI as RE = (1 + γ/m)?1/2, where m is the number of imputations, lead... In his 1987 classic book on multiple imputation (MI), Rubin used the fraction of missing information, γ, to define the relative efficiency (RE) of MI as RE = (1 + γ/m)?1/2, where m is the number of imputations, leading to the conclusion that a small m (≤5) would be sufficient for MI. However, evidence has been accumulating that many more imputations are needed. Why would the apparently sufficient m deduced from the RE be actually too small? The answer may lie with γ. In this research, γ was determined at the fractions of missing data (δ) of 4%, 10%, 20%, and 29% using the 2012 Physician Workflow Mail Survey of the National Ambulatory Medical Care Survey (NAMCS). The γ values were strikingly small, ranging in the order of 10?6 to 0.01. As δ increased, γ usually increased but sometimes decreased. How the data were analysed had the dominating effects on γ, overshadowing the effect of δ. The results suggest that it is impossible to predict γ using δ and that it may not be appropriate to use the γ-based RE to determine sufficient m. 展开更多
关键词 Multiple Imputation Fraction of missing Information (γ) Sufficient Number of Imputations missing Data NAMCS
下载PDF
Improving Disease Prevalence Estimates Using Missing Data Techniques
14
作者 Elhadji Moustapha Seck Ngesa Owino Oscar Abdou Ka Diongue 《Open Journal of Statistics》 2016年第6期1110-1122,共14页
The prevalence of a disease in a population is defined as the proportion of people who are infected. Selection bias in disease prevalence estimates occurs if non-participation in testing is correlated with disease sta... The prevalence of a disease in a population is defined as the proportion of people who are infected. Selection bias in disease prevalence estimates occurs if non-participation in testing is correlated with disease status. Missing data are commonly encountered in most medical research. Unfortunately, they are often neglected or not properly handled during analytic procedures, and this may substantially bias the results of the study, reduce the study power, and lead to invalid conclusions. The goal of this study is to illustrate how to estimate prevalence in the presence of missing data. We consider a case where the variable of interest (response variable) is binary and some of the observations are missing and assume that all the covariates are fully observed. In most cases, the statistic of interest, when faced with binary data is the prevalence. We develop a two stage approach to improve the prevalence estimates;in the first stage, we use the logistic regression model to predict the missing binary observations and then in the second stage we recalculate the prevalence using the observed data and the imputed missing data. Such a model would be of great interest in research studies involving HIV/AIDS in which people usually refuse to donate blood for testing yet they are willing to provide other covariates. The prevalence estimation method is illustrated using simulated data and applied to HIV/AIDS data from the Kenya AIDS Indicator Survey, 2007. 展开更多
关键词 Disease Prevalence missing Data Non-Participant Logistic Regression Model Prevalence Estimates HIV/AIDS
下载PDF
Study on the Missing Data Mechanisms and Imputation Methods
15
作者 Abdullah Z. Alruhaymi Charles J. Kim 《Open Journal of Statistics》 2021年第4期477-492,共16页
The absence of some data values in any observed dataset has been a real hindrance to achieving valid results in statistical research. This paper</span></span><span><span><span style="fo... The absence of some data values in any observed dataset has been a real hindrance to achieving valid results in statistical research. This paper</span></span><span><span><span style="font-family:""> </span></span></span><span style="font-family:Verdana;"><span style="font-family:Verdana;"><span style="font-family:Verdana;">aim</span></span></span><span style="font-family:Verdana;"><span style="font-family:Verdana;"><span style="font-family:Verdana;">ed</span></span></span><span style="font-family:Verdana;"><span style="font-family:Verdana;"><span style="font-family:Verdana;"> at the missing data widespread problem faced by analysts and statisticians in academia and professional environments. Some data-driven methods were studied to obtain accurate data. Projects that highly rely on data face this missing data problem. And since machine learning models are only as good as the data used to train them, the missing data problem has a real impact on the solutions developed for real-world problems. Therefore, in this dissertation, there is an attempt to solve this problem using different mechanisms. This is done by testing the effectiveness of both traditional and modern data imputation techniques by determining the loss of statistical power when these different approaches are used to tackle the missing data problem. At the end of this research dissertation, it should be easy to establish which methods are the best when handling the research problem. It is recommended that using Multivariate Imputation by Chained Equations (MICE) for MAR missingness is the best approach </span></span></span><span style="font-family:Verdana;"><span style="font-family:Verdana;"><span style="font-family:Verdana;">to</span></span></span><span style="font-family:Verdana;"><span style="font-family:Verdana;"><span style="font-family:Verdana;"> dealing with missing data. 展开更多
关键词 missing Data MECHANISMS Imputation Techniques MODELS
下载PDF
Random Subspace Sampling for Classification with Missing Data
16
作者 曹云浩 吴建鑫 《Journal of Computer Science & Technology》 SCIE EI CSCD 2024年第2期472-486,共15页
Many real-world datasets suffer from the unavoidable issue of missing values,and therefore classification with missing data has to be carefully handled since inadequate treatment of missing values will cause large err... Many real-world datasets suffer from the unavoidable issue of missing values,and therefore classification with missing data has to be carefully handled since inadequate treatment of missing values will cause large errors.In this paper,we propose a random subspace sampling method,RSS,by sampling missing items from the corresponding feature histogram distributions in random subspaces,which is effective and efficient at different levels of missing data.Unlike most established approaches,RSS does not train on fixed imputed datasets.Instead,we design a dynamic training strategy where the filled values change dynamically by resampling during training.Moreover,thanks to the sampling strategy,we design an ensemble testing strategy where we combine the results of multiple runs of a single model,which is more efficient and resource-saving than previous ensemble methods.Finally,we combine these two strategies with the random subspace method,which makes our estimations more robust and accurate.The effectiveness of the proposed RSS method is well validated by experimental studies. 展开更多
关键词 missing data random subspace neural network ensemble learning
原文传递
Outlier screening for ironmaking data on blast furnaces 被引量:5
17
作者 Jun Zhao Shao-fei Chen +3 位作者 Xiao-jie Liu Xin Li Hong-yang Li Qing Lyu 《International Journal of Minerals,Metallurgy and Materials》 SCIE EI CAS CSCD 2021年第6期1001-1010,共10页
Blast furnace data processing is prone to problems such as outliers.To overcome these problems and identify an improved method for processing blast furnace data,we conducted an in-depth study of blast furnace data.Bas... Blast furnace data processing is prone to problems such as outliers.To overcome these problems and identify an improved method for processing blast furnace data,we conducted an in-depth study of blast furnace data.Based on data samples from selected iron and steel companies,data types were classified according to different characteristics;then,appropriate methods were selected to process them in order to solve the deficiencies and outliers of the original blast furnace data.Linear interpolation was used to fill in the divided continuation data,the Knearest neighbor(KNN)algorithm was used to fill in correlation data with the internal law,and periodic statistical data were filled by the average.The error rate in the filling was low,and the fitting degree was over 85%.For the screening of outliers,corresponding indicator parameters were added according to the continuity,relevance,and periodicity of different data.Also,a variety of algorithms were used for processing.Through the analysis of screening results,a large amount of efficient information in the data was retained,and ineffective outliers were eliminated.Standardized processing of blast furnace big data as the basis of applied research on blast furnace big data can serve as an important means to improve data quality and retain data value. 展开更多
关键词 blast furnace data missing OUTLIERS data processing data mining
下载PDF
Reconstruction of incomplete satellite SST data sets based on EOF method 被引量:2
18
作者 DING Youzhuan WEI Zhihui +2 位作者 MAO Zhihua WANG Xiaofei PAN Delu 《Acta Oceanologica Sinica》 SCIE CAS CSCD 2009年第2期36-44,共9页
As for the satellite remote sensing data obtained by the visible and infrared bands myers,on, the clouds coverage in the sky over the ocean often results in missing data of inversion products on a large scale, and thi... As for the satellite remote sensing data obtained by the visible and infrared bands myers,on, the clouds coverage in the sky over the ocean often results in missing data of inversion products on a large scale, and thin clouds difficult to be detected would cause the data of the inversion products to be abnormal. Alvera et a1.(2005) proposed a method for the reconstruction of missing data based on an Empirical Orthogonal Functions (EOF) decomposition, but his method couldn't process these images presenting extreme cloud coverage(more than 95%), and required a long time for recon- struction. Besides, the abnormal data in the images had a great effect on the reconstruction result. Therefore, this paper tries to improve the study result. It has reconstructed missing data sets by twice applying EOF decomposition method. Firstly, the abnormity time has been detected by analyzing the temporal modes of EOF decomposition, and the abnormal data have been eliminated. Secondly, the data sets, excluding the abnormal data, are analyzed by using EOF decomposition, and then the temporal modes undergo a filtering process so as to enhance the ability of reconstruct- ing the images which are of no or just a little data, by using EOF. At last, this method has been applied to a large data set, i.e. 43 Sea Surface Temperature (SST) satellite images of the Changjiang River (Yangtze River) estuary and its adjacent areas, and the total reconstruction root mean square error (RMSE) is 0.82℃. And it has been proved that this improved EOF reconstruction method is robust for reconstructing satellite missing data and unreliable data. 展开更多
关键词 EOF SST Changjiang River estuary missing data sets
下载PDF
A PID-incorporated Latent Factorization of Tensors Approach to Dynamically Weighted Directed Network Analysis 被引量:1
19
作者 Hao Wu Xin Luo +3 位作者 MengChu Zhou Muhyaddin J.Rawa Khaled Sedraoui Aiiad Albeshri 《IEEE/CAA Journal of Automatica Sinica》 SCIE EI CSCD 2022年第3期533-546,共14页
A large-scale dynamically weighted directed network(DWDN)involving numerous entities and massive dynamic interaction is an essential data source in many big-data-related applications,like in a terminal interaction pat... A large-scale dynamically weighted directed network(DWDN)involving numerous entities and massive dynamic interaction is an essential data source in many big-data-related applications,like in a terminal interaction pattern analysis system(TIPAS).It can be represented by a high-dimensional and incomplete(HDI)tensor whose entries are mostly unknown.Yet such an HDI tensor contains a wealth knowledge regarding various desired patterns like potential links in a DWDN.A latent factorization-of-tensors(LFT)model proves to be highly efficient in extracting such knowledge from an HDI tensor,which is commonly achieved via a stochastic gradient descent(SGD)solver.However,an SGD-based LFT model suffers from slow convergence that impairs its efficiency on large-scale DWDNs.To address this issue,this work proposes a proportional-integralderivative(PID)-incorporated LFT model.It constructs an adjusted instance error based on the PID control principle,and then substitutes it into an SGD solver to improve the convergence rate.Empirical studies on two DWDNs generated by a real TIPAS show that compared with state-of-the-art models,the proposed model achieves significant efficiency gain as well as highly competitive prediction accuracy when handling the task of missing link prediction for a given DWDN. 展开更多
关键词 Big data high dimensional and incomplete(HDI)tensor latent factorization-of-tensors(LFT) machine learning missing data optimization proportional-integral-derivative(PID)controller
下载PDF
Target threat estimation based on discrete dynamic Bayesian networks with small samples 被引量:1
20
作者 YE Fang MAO Ying +1 位作者 LI Yibing LIU Xinrui 《Journal of Systems Engineering and Electronics》 SCIE EI CSCD 2022年第5期1135-1142,共8页
The accuracy of target threat estimation has a great impact on command decision-making.The Bayesian network,as an effective way to deal with the problem of uncertainty,can be used to track the change of the target thr... The accuracy of target threat estimation has a great impact on command decision-making.The Bayesian network,as an effective way to deal with the problem of uncertainty,can be used to track the change of the target threat level.Unfortunately,the traditional discrete dynamic Bayesian network(DDBN)has the problems of poor parameter learning and poor reasoning accuracy in a small sample environment with partial prior information missing.Considering the finiteness and discreteness of DDBN parameters,a fuzzy k-nearest neighbor(KNN)algorithm based on correlation of feature quantities(CF-FKNN)is proposed for DDBN parameter learning.Firstly,the correlation between feature quantities is calculated,and then the KNN algorithm with fuzzy weight is introduced to fill the missing data.On this basis,a reasonable DDBN structure is constructed by using expert experience to complete DDBN parameter learning and reasoning.Simulation results show that the CF-FKNN algorithm can accurately fill in the data when the samples are seriously missing,and improve the effect of DDBN parameter learning in the case of serious sample missing.With the proposed method,the final target threat assessment results are reasonable,which meets the needs of engineering applications. 展开更多
关键词 discrete dynamic Bayesian network(DDBN) parameter learning missing data filling Bayesian estimation
下载PDF
上一页 1 2 3 下一页 到第
使用帮助 返回顶部