The frequent missing values in radar-derived time-series tracks of aerial targets(RTT-AT)lead to significant challenges in subsequent data-driven tasks.However,the majority of imputation research focuses on random mis...The frequent missing values in radar-derived time-series tracks of aerial targets(RTT-AT)lead to significant challenges in subsequent data-driven tasks.However,the majority of imputation research focuses on random missing(RM)that differs significantly from common missing patterns of RTT-AT.The method for solving the RM may experience performance degradation or failure when applied to RTT-AT imputation.Conventional autoregressive deep learning methods are prone to error accumulation and long-term dependency loss.In this paper,a non-autoregressive imputation model that addresses the issue of missing value imputation for two common missing patterns in RTT-AT is proposed.Our model consists of two probabilistic sparse diagonal masking self-attention(PSDMSA)units and a weight fusion unit.It learns missing values by combining the representations outputted by the two units,aiming to minimize the difference between the missing values and their actual values.The PSDMSA units effectively capture temporal dependencies and attribute correlations between time steps,improving imputation quality.The weight fusion unit automatically updates the weights of the output representations from the two units to obtain a more accurate final representation.The experimental results indicate that,despite varying missing rates in the two missing patterns,our model consistently outperforms other methods in imputation performance and exhibits a low frequency of deviations in estimates for specific missing entries.Compared to the state-of-the-art autoregressive deep learning imputation model Bidirectional Recurrent Imputation for Time Series(BRITS),our proposed model reduces mean absolute error(MAE)by 31%~50%.Additionally,the model attains a training speed that is 4 to 8 times faster when compared to both BRITS and a standard Transformer model when trained on the same dataset.Finally,the findings from the ablation experiments demonstrate that the PSDMSA,the weight fusion unit,cascade network design,and imputation loss enhance imputation performance and confirm the efficacy of our design.展开更多
In wireless sensor networks(WSNs),the performance of related applications is highly dependent on the quality of data collected.Unfortunately,missing data is almost inevitable in the process of data acquisition and tra...In wireless sensor networks(WSNs),the performance of related applications is highly dependent on the quality of data collected.Unfortunately,missing data is almost inevitable in the process of data acquisition and transmission.Existing methods often rely on prior information such as low-rank characteristics or spatiotemporal correlation when recovering missing WSNs data.However,in realistic application scenarios,it is very difficult to obtain these prior information from incomplete data sets.Therefore,we aim to recover the missing WSNs data effectively while getting rid of the perplexity of prior information.By designing the corresponding measurement matrix that can capture the position of missing data and sparse representation matrix,a compressive sensing(CS)based missing data recovery model is established.Then,we design a comparison standard to select the best sparse representation basis and introduce average cross-correlation to examine the rationality of the established model.Furthermore,an improved fast matching pursuit algorithm is proposed to solve the model.Simulation results show that the proposed method can effectively recover the missing WSNs data.展开更多
Ethical statements were not included in the published version of the following articles that appeared in previous issues of Journal of Integrative Agriculture.The appropriate statements provided by the Authors are inc...Ethical statements were not included in the published version of the following articles that appeared in previous issues of Journal of Integrative Agriculture.The appropriate statements provided by the Authors are included below.展开更多
Ethical statements were not included in the published version of the following articles that appeared in previous issues of Journal of Integrative Agriculture.The appropriate statements provided by the Authors are inc...Ethical statements were not included in the published version of the following articles that appeared in previous issues of Journal of Integrative Agriculture.The appropriate statements provided by the Authors are included below.展开更多
Ethical statements were not included in the published version of the following articles that appeared in previous issues of Journal of Integrative Agriculture.The appropriate statements provided by the Authors are inc...Ethical statements were not included in the published version of the following articles that appeared in previous issues of Journal of Integrative Agriculture.The appropriate statements provided by the Authors are included below.展开更多
Ethical statements were not included in the published version of the following articles that appeared in previous issues of Journal of Integrative Agriculture.The appropriate statements provided by the Authors are inc...Ethical statements were not included in the published version of the following articles that appeared in previous issues of Journal of Integrative Agriculture.The appropriate statements provided by the Authors are included below.展开更多
Ethical statements were not included in the published version of the following articles that appeared in previous issues of Journal of Integrative Agriculture.The appropriate statements provided by the Authors are inc...Ethical statements were not included in the published version of the following articles that appeared in previous issues of Journal of Integrative Agriculture.The appropriate statements provided by the Authors are included below.展开更多
The estimation of covariance matrices is very important in many fields, such as statistics. In real applications, data are frequently influenced by high dimensions and noise. However, most relevant studies are based o...The estimation of covariance matrices is very important in many fields, such as statistics. In real applications, data are frequently influenced by high dimensions and noise. However, most relevant studies are based on complete data. This paper studies the optimal estimation of high-dimensional covariance matrices based on missing and noisy sample under the norm. First, the model with sub-Gaussian additive noise is presented. The generalized sample covariance is then modified to define a hard thresholding estimator , and the minimax upper bound is derived. After that, the minimax lower bound is derived, and it is concluded that the estimator presented in this article is rate-optimal. Finally, numerical simulation analysis is performed. The result shows that for missing samples with sub-Gaussian noise, if the true covariance matrix is sparse, the hard thresholding estimator outperforms the traditional estimate method.展开更多
Missing data presents a significant challenge in statistical analysis and machine learning, often resulting in biased outcomes and diminished efficiency. This comprehensive review investigates various imputation techn...Missing data presents a significant challenge in statistical analysis and machine learning, often resulting in biased outcomes and diminished efficiency. This comprehensive review investigates various imputation techniques, categorizing them into three primary approaches: deterministic methods, probabilistic models, and machine learning algorithms. Traditional techniques, including mean or mode imputation, regression imputation, and last observation carried forward, are evaluated alongside more contemporary methods such as multiple imputation, expectation-maximization, and deep learning strategies. The strengths and limitations of each approach are outlined. Key considerations for selecting appropriate methods, based on data characteristics and research objectives, are discussed. The importance of evaluating imputation’s impact on subsequent analyses is emphasized. This synthesis of recent advancements and best practices provides researchers with a robust framework for effectively handling missing data, thereby improving the reliability of empirical findings across diverse disciplines.展开更多
Background:Missing data are frequently occurred in clinical studies.Due to the development of precision medicine,there is an increased interest in N-of-1 trial.Bayesian models are one of main statistical methods for a...Background:Missing data are frequently occurred in clinical studies.Due to the development of precision medicine,there is an increased interest in N-of-1 trial.Bayesian models are one of main statistical methods for analyzing the data of N-of-1 trials.This simulation study aimed to compare two statistical methods for handling missing values of quantitative data in Bayesian N-of-1 trials.Methods:The simulated data of N-of-1 trials with different coefficients of autocorrelation,effect sizes and missing ratios are obtained by SAS 9.1 system.The missing values are filled with mean filling and regression filling respectively in the condition of different coefficients of autocorrelation,effect sizes and missing ratios by SPSS 25.0 software.Bayesian models are built to estimate the posterior means by Winbugs 14 software.Results:When the missing ratio is relatively small,e.g.5%,missing values have relatively little effect on the results.Therapeutic effects may be underestimated when the coefficient of autocorrelation increases and no filling is used.However,it may be overestimated when mean or regression filling is used,and the results after mean filling are closer to the actual effect than regression filling.In the case of moderate missing ratio,the estimated effect after mean filling is closer to the actual effect compared to regression filling.When a large missing ratio(20%)occurs,data missing can lead to significantly underestimate the effect.In this case,the estimated effect after regression filling is closer to the actual effect compared to mean filling.Conclusion:Data missing can affect the estimated therapeutic effects using Bayesian models in N-of-1 trials.The present study suggests that mean filling can be used under situation of missing ratio≤10%.Otherwise,regression filling may be preferable.展开更多
In this paper, a model averaging method is proposed for varying-coefficient models with response missing at random by establishing a weight selection criterion based on cross-validation. Under certain regularity condi...In this paper, a model averaging method is proposed for varying-coefficient models with response missing at random by establishing a weight selection criterion based on cross-validation. Under certain regularity conditions, it is proved that the proposed method is asymptotically optimal in the sense of achieving the minimum squared error.展开更多
Fear of missing out(FoMO)is a unique term introduced in 2004 to describe a phenomenon observed on social networking sites.FoMO includes two processes;firstly,perception of missing out,followed up with a compulsive beh...Fear of missing out(FoMO)is a unique term introduced in 2004 to describe a phenomenon observed on social networking sites.FoMO includes two processes;firstly,perception of missing out,followed up with a compulsive behavior to maintain these social connections.We are interested in understanding the complex construct of FoMO and its relations to the need to belong and form stable interpersonal relationships.It is associated with a range of negative life experiences and feelings,due to it being considered a problematic attachment to social media.We have provided a general review of the literature and have summarized the findings in relation to mental health,social functioning,sleep,academic performance and productivity,neuro-developmental disorders,and physical well-being.We have also discussed the treatment options available for FoMo based on cognitive behavior therapy.It imperative that new findings on FoMO are communicated to the clinical community as it has diagnostic implications and could be a confounding variable in those who do not respond to treatment as usual.展开更多
Most real application processes belong to a complex nonlinear system with incomplete information. It is difficult to estimate a model by assuming that the data set is governed by a global model. Moreover, in real proc...Most real application processes belong to a complex nonlinear system with incomplete information. It is difficult to estimate a model by assuming that the data set is governed by a global model. Moreover, in real processes, the available data set is usually obtained with missing values. To overcome the shortcomings of global modeling and missing data values, a new modeling method is proposed. Firstly, an incomplete data set with missing values is partitioned into several clusters by a K-means with soft constraints (KSC) algorithm, which incorporates soft constraints to enable clustering with missing values. Then a local model based on each group is developed by using SVR algorithm, which adopts a missing value insensitive (MVI) kernel to investigate the missing value estimation problem. For each local model, its valid area is gotten as well. Simulation results prove the effectiveness of the current local model and the estimation algorithm.展开更多
The first thunderstorm weather appeared in southern Shenyang on May 2,2010 and did not bring about severe lightning disaster for Shenyang region,but forecast service had poor effect without forecasting thunderstorm we...The first thunderstorm weather appeared in southern Shenyang on May 2,2010 and did not bring about severe lightning disaster for Shenyang region,but forecast service had poor effect without forecasting thunderstorm weather accurately.In our paper,the reasons for missing report of this thunderstorm weather were analyzed,and analysis on thunderstorm potential was carried out by means of mesoscale analysis technique,providing technical index and vantage point for the prediction of thunderstorm potential.The results showed that the reasons for missing report of this weather process were as follows:surface temperature at prophase was constantly lower going against the development of convective weather;the interpreting and analyzing ability of numerical forecast product should be improved;the forecast result of T639 model was better than that of Japanese numerical forecast;the study and application of mesoscale analysis technique should be strengthened,and this service was formally developed after thunderstorm weather on June 1,2010.展开更多
Missing via has been a defect in semiconductor manufacturing,especially of foundries. Its solution can be rather attractive in yield improvement for relatively mature technology since each percentage point improvement...Missing via has been a defect in semiconductor manufacturing,especially of foundries. Its solution can be rather attractive in yield improvement for relatively mature technology since each percentage point improvement will mean significant profit margin enhancement. However, the root cause for the missing via defect is not easy to determine since many factors,such as, defocus, material re-deposition, and inadequate development,can lead to missing via defects. Therefore, knowing the exact cause for each defect type is the key. In this paper, we will present the analysis methodology used in our company. In the experiments,we have observed three types of missing vias. The first type consists of large areas, usually circular,of missing patterns,which are primarily located near the wafer edge. The second type consists of isolated sites with single partially opened vias or completely unopened vias. The third type consists of relatively small circular areas,within which the entire via pattern is missing. We have first tried the optimization of the developing recipe and found that the first type of missing via can be largely removed through the tuning of the rinse process, which improves the cleaning efficiency of the developing residue. However, this method does not remove missing via of the second and third type. We found that the second type of missing via is related to local defocus caused by topographical distribution. To resolve the third type of missing via defects, we have performed extensive experiments with different types of developer nozzles and different types of photomasks,and the result is that we have not found any distinct dependence of the defect density on either the nozzle or the mask types. Moreover, we have also studied the defect density from three resists with different resolution capability and found a correlation between the defect density and the resist resolution. It seems that,in general, lower resolution resists also have lower defect density. The results will be presented in the paper.展开更多
In his 1987 classic book on multiple imputation (MI), Rubin used the fraction of missing information, γ, to define the relative efficiency (RE) of MI as RE = (1 + γ/m)?1/2, where m is the number of imputations, lead...In his 1987 classic book on multiple imputation (MI), Rubin used the fraction of missing information, γ, to define the relative efficiency (RE) of MI as RE = (1 + γ/m)?1/2, where m is the number of imputations, leading to the conclusion that a small m (≤5) would be sufficient for MI. However, evidence has been accumulating that many more imputations are needed. Why would the apparently sufficient m deduced from the RE be actually too small? The answer may lie with γ. In this research, γ was determined at the fractions of missing data (δ) of 4%, 10%, 20%, and 29% using the 2012 Physician Workflow Mail Survey of the National Ambulatory Medical Care Survey (NAMCS). The γ values were strikingly small, ranging in the order of 10?6 to 0.01. As δ increased, γ usually increased but sometimes decreased. How the data were analysed had the dominating effects on γ, overshadowing the effect of δ. The results suggest that it is impossible to predict γ using δ and that it may not be appropriate to use the γ-based RE to determine sufficient m.展开更多
In this study, we investigate the effects of missing data when estimating HIV/TB co-infection. We revisit the concept of missing data and examine three available approaches for dealing with missingness. The main objec...In this study, we investigate the effects of missing data when estimating HIV/TB co-infection. We revisit the concept of missing data and examine three available approaches for dealing with missingness. The main objective is to identify the best method for correcting missing data in TB/HIV Co-infection setting. We employ both empirical data analysis and extensive simulation study to examine the effects of missing data, the accuracy, sensitivity, specificity and train and test error for different approaches. The novelty of this work hinges on the use of modern statistical learning algorithm when treating missingness. In the empirical analysis, both HIV data and TB-HIV co-infection data imputations were performed, and the missing values were imputed using different approaches. In the simulation study, sets of 0% (Complete case), 10%, 30%, 50% and 80% of the data were drawn randomly and replaced with missing values. Results show complete cases only had a co-infection rate (95% Confidence Interval band) of 29% (25%, 33%), weighted method 27% (23%, 31%), likelihood-based approach 26% (24%, 28%) and multiple imputation approach 21% (20%, 22%). In conclusion, MI remains the best approach for dealing with missing data and failure to apply it, results to overestimation of HIV/TB co-infection rate by 8%.展开更多
Taken as an example the Upper Permian reef section on the carbonate platformmargin at Ziyun, Guizhou, the paper discusses the missing time of a hiatus surface, an impor-tant problem in chemical sequence stratigraphy, ...Taken as an example the Upper Permian reef section on the carbonate platformmargin at Ziyun, Guizhou, the paper discusses the missing time of a hiatus surface, an impor-tant problem in chemical sequence stratigraphy, with the concept of cosmic chemistry. Then thepaper proposes a series of new concepts for chemical sequence stratigraphy, including the con-densation surface, relative compaction factor and time missing factor. Finally a quantitativecurve of Late Permian relative sea-level change in the Ziyun area is presented with timecoordinates.展开更多
The problem of missing values has long been studied by researchers working in areas of data science and bioinformatics,especially the analysis of gene expression data that facilitates an early detection of cancer.Many...The problem of missing values has long been studied by researchers working in areas of data science and bioinformatics,especially the analysis of gene expression data that facilitates an early detection of cancer.Many attempts show improvements made by excluding samples with missing information from the analysis process,while others have tried to fill the gaps with possible values.While the former is simple,the latter safeguards information loss.For that,a neighbour-based(KNN)approach has proven more effective than other global estimators.The paper extends this further by introducing a new summarizationmethod to theKNNmodel.It is the first study that applies the concept of ordered weighted averaging(OWA)operator to such a problem context.In particular,two variations of OWA aggregation are proposed and evaluated against their baseline and other neighbor-based models.Using different ratios of missing values from 1%-20%and a set of six published gene expression datasets,the experimental results suggest that newmethods usually provide more accurate estimates than those compared methods.Specific to the missing rates of 5%and 20%,the best NRMSE scores as averages across datasets is 0.65 and 0.69,while the highest measures obtained by existing techniques included in this study are 0.80 and 0.84,respectively.展开更多
基金supported by Graduate Funded Project(No.JY2022A017).
文摘The frequent missing values in radar-derived time-series tracks of aerial targets(RTT-AT)lead to significant challenges in subsequent data-driven tasks.However,the majority of imputation research focuses on random missing(RM)that differs significantly from common missing patterns of RTT-AT.The method for solving the RM may experience performance degradation or failure when applied to RTT-AT imputation.Conventional autoregressive deep learning methods are prone to error accumulation and long-term dependency loss.In this paper,a non-autoregressive imputation model that addresses the issue of missing value imputation for two common missing patterns in RTT-AT is proposed.Our model consists of two probabilistic sparse diagonal masking self-attention(PSDMSA)units and a weight fusion unit.It learns missing values by combining the representations outputted by the two units,aiming to minimize the difference between the missing values and their actual values.The PSDMSA units effectively capture temporal dependencies and attribute correlations between time steps,improving imputation quality.The weight fusion unit automatically updates the weights of the output representations from the two units to obtain a more accurate final representation.The experimental results indicate that,despite varying missing rates in the two missing patterns,our model consistently outperforms other methods in imputation performance and exhibits a low frequency of deviations in estimates for specific missing entries.Compared to the state-of-the-art autoregressive deep learning imputation model Bidirectional Recurrent Imputation for Time Series(BRITS),our proposed model reduces mean absolute error(MAE)by 31%~50%.Additionally,the model attains a training speed that is 4 to 8 times faster when compared to both BRITS and a standard Transformer model when trained on the same dataset.Finally,the findings from the ablation experiments demonstrate that the PSDMSA,the weight fusion unit,cascade network design,and imputation loss enhance imputation performance and confirm the efficacy of our design.
基金supported by the National Natural Science Foundation of China(No.61871400)the Natural Science Foundation of the Jiangsu Province of China(No.BK20171401)。
文摘In wireless sensor networks(WSNs),the performance of related applications is highly dependent on the quality of data collected.Unfortunately,missing data is almost inevitable in the process of data acquisition and transmission.Existing methods often rely on prior information such as low-rank characteristics or spatiotemporal correlation when recovering missing WSNs data.However,in realistic application scenarios,it is very difficult to obtain these prior information from incomplete data sets.Therefore,we aim to recover the missing WSNs data effectively while getting rid of the perplexity of prior information.By designing the corresponding measurement matrix that can capture the position of missing data and sparse representation matrix,a compressive sensing(CS)based missing data recovery model is established.Then,we design a comparison standard to select the best sparse representation basis and introduce average cross-correlation to examine the rationality of the established model.Furthermore,an improved fast matching pursuit algorithm is proposed to solve the model.Simulation results show that the proposed method can effectively recover the missing WSNs data.
文摘Ethical statements were not included in the published version of the following articles that appeared in previous issues of Journal of Integrative Agriculture.The appropriate statements provided by the Authors are included below.
文摘Ethical statements were not included in the published version of the following articles that appeared in previous issues of Journal of Integrative Agriculture.The appropriate statements provided by the Authors are included below.
文摘Ethical statements were not included in the published version of the following articles that appeared in previous issues of Journal of Integrative Agriculture.The appropriate statements provided by the Authors are included below.
文摘Ethical statements were not included in the published version of the following articles that appeared in previous issues of Journal of Integrative Agriculture.The appropriate statements provided by the Authors are included below.
文摘Ethical statements were not included in the published version of the following articles that appeared in previous issues of Journal of Integrative Agriculture.The appropriate statements provided by the Authors are included below.
文摘The estimation of covariance matrices is very important in many fields, such as statistics. In real applications, data are frequently influenced by high dimensions and noise. However, most relevant studies are based on complete data. This paper studies the optimal estimation of high-dimensional covariance matrices based on missing and noisy sample under the norm. First, the model with sub-Gaussian additive noise is presented. The generalized sample covariance is then modified to define a hard thresholding estimator , and the minimax upper bound is derived. After that, the minimax lower bound is derived, and it is concluded that the estimator presented in this article is rate-optimal. Finally, numerical simulation analysis is performed. The result shows that for missing samples with sub-Gaussian noise, if the true covariance matrix is sparse, the hard thresholding estimator outperforms the traditional estimate method.
文摘Missing data presents a significant challenge in statistical analysis and machine learning, often resulting in biased outcomes and diminished efficiency. This comprehensive review investigates various imputation techniques, categorizing them into three primary approaches: deterministic methods, probabilistic models, and machine learning algorithms. Traditional techniques, including mean or mode imputation, regression imputation, and last observation carried forward, are evaluated alongside more contemporary methods such as multiple imputation, expectation-maximization, and deep learning strategies. The strengths and limitations of each approach are outlined. Key considerations for selecting appropriate methods, based on data characteristics and research objectives, are discussed. The importance of evaluating imputation’s impact on subsequent analyses is emphasized. This synthesis of recent advancements and best practices provides researchers with a robust framework for effectively handling missing data, thereby improving the reliability of empirical findings across diverse disciplines.
基金supported by the National Natural Science Foundation of China (No.81973705).
文摘Background:Missing data are frequently occurred in clinical studies.Due to the development of precision medicine,there is an increased interest in N-of-1 trial.Bayesian models are one of main statistical methods for analyzing the data of N-of-1 trials.This simulation study aimed to compare two statistical methods for handling missing values of quantitative data in Bayesian N-of-1 trials.Methods:The simulated data of N-of-1 trials with different coefficients of autocorrelation,effect sizes and missing ratios are obtained by SAS 9.1 system.The missing values are filled with mean filling and regression filling respectively in the condition of different coefficients of autocorrelation,effect sizes and missing ratios by SPSS 25.0 software.Bayesian models are built to estimate the posterior means by Winbugs 14 software.Results:When the missing ratio is relatively small,e.g.5%,missing values have relatively little effect on the results.Therapeutic effects may be underestimated when the coefficient of autocorrelation increases and no filling is used.However,it may be overestimated when mean or regression filling is used,and the results after mean filling are closer to the actual effect than regression filling.In the case of moderate missing ratio,the estimated effect after mean filling is closer to the actual effect compared to regression filling.When a large missing ratio(20%)occurs,data missing can lead to significantly underestimate the effect.In this case,the estimated effect after regression filling is closer to the actual effect compared to mean filling.Conclusion:Data missing can affect the estimated therapeutic effects using Bayesian models in N-of-1 trials.The present study suggests that mean filling can be used under situation of missing ratio≤10%.Otherwise,regression filling may be preferable.
文摘In this paper, a model averaging method is proposed for varying-coefficient models with response missing at random by establishing a weight selection criterion based on cross-validation. Under certain regularity conditions, it is proved that the proposed method is asymptotically optimal in the sense of achieving the minimum squared error.
文摘Fear of missing out(FoMO)is a unique term introduced in 2004 to describe a phenomenon observed on social networking sites.FoMO includes two processes;firstly,perception of missing out,followed up with a compulsive behavior to maintain these social connections.We are interested in understanding the complex construct of FoMO and its relations to the need to belong and form stable interpersonal relationships.It is associated with a range of negative life experiences and feelings,due to it being considered a problematic attachment to social media.We have provided a general review of the literature and have summarized the findings in relation to mental health,social functioning,sleep,academic performance and productivity,neuro-developmental disorders,and physical well-being.We have also discussed the treatment options available for FoMo based on cognitive behavior therapy.It imperative that new findings on FoMO are communicated to the clinical community as it has diagnostic implications and could be a confounding variable in those who do not respond to treatment as usual.
基金supported by Key Discipline Construction Program of Beijing Municipal Commission of Education (XK10008043)
文摘Most real application processes belong to a complex nonlinear system with incomplete information. It is difficult to estimate a model by assuming that the data set is governed by a global model. Moreover, in real processes, the available data set is usually obtained with missing values. To overcome the shortcomings of global modeling and missing data values, a new modeling method is proposed. Firstly, an incomplete data set with missing values is partitioned into several clusters by a K-means with soft constraints (KSC) algorithm, which incorporates soft constraints to enable clustering with missing values. Then a local model based on each group is developed by using SVR algorithm, which adopts a missing value insensitive (MVI) kernel to investigate the missing value estimation problem. For each local model, its valid area is gotten as well. Simulation results prove the effectiveness of the current local model and the estimation algorithm.
文摘The first thunderstorm weather appeared in southern Shenyang on May 2,2010 and did not bring about severe lightning disaster for Shenyang region,but forecast service had poor effect without forecasting thunderstorm weather accurately.In our paper,the reasons for missing report of this thunderstorm weather were analyzed,and analysis on thunderstorm potential was carried out by means of mesoscale analysis technique,providing technical index and vantage point for the prediction of thunderstorm potential.The results showed that the reasons for missing report of this weather process were as follows:surface temperature at prophase was constantly lower going against the development of convective weather;the interpreting and analyzing ability of numerical forecast product should be improved;the forecast result of T639 model was better than that of Japanese numerical forecast;the study and application of mesoscale analysis technique should be strengthened,and this service was formally developed after thunderstorm weather on June 1,2010.
文摘Missing via has been a defect in semiconductor manufacturing,especially of foundries. Its solution can be rather attractive in yield improvement for relatively mature technology since each percentage point improvement will mean significant profit margin enhancement. However, the root cause for the missing via defect is not easy to determine since many factors,such as, defocus, material re-deposition, and inadequate development,can lead to missing via defects. Therefore, knowing the exact cause for each defect type is the key. In this paper, we will present the analysis methodology used in our company. In the experiments,we have observed three types of missing vias. The first type consists of large areas, usually circular,of missing patterns,which are primarily located near the wafer edge. The second type consists of isolated sites with single partially opened vias or completely unopened vias. The third type consists of relatively small circular areas,within which the entire via pattern is missing. We have first tried the optimization of the developing recipe and found that the first type of missing via can be largely removed through the tuning of the rinse process, which improves the cleaning efficiency of the developing residue. However, this method does not remove missing via of the second and third type. We found that the second type of missing via is related to local defocus caused by topographical distribution. To resolve the third type of missing via defects, we have performed extensive experiments with different types of developer nozzles and different types of photomasks,and the result is that we have not found any distinct dependence of the defect density on either the nozzle or the mask types. Moreover, we have also studied the defect density from three resists with different resolution capability and found a correlation between the defect density and the resist resolution. It seems that,in general, lower resolution resists also have lower defect density. The results will be presented in the paper.
文摘In his 1987 classic book on multiple imputation (MI), Rubin used the fraction of missing information, γ, to define the relative efficiency (RE) of MI as RE = (1 + γ/m)?1/2, where m is the number of imputations, leading to the conclusion that a small m (≤5) would be sufficient for MI. However, evidence has been accumulating that many more imputations are needed. Why would the apparently sufficient m deduced from the RE be actually too small? The answer may lie with γ. In this research, γ was determined at the fractions of missing data (δ) of 4%, 10%, 20%, and 29% using the 2012 Physician Workflow Mail Survey of the National Ambulatory Medical Care Survey (NAMCS). The γ values were strikingly small, ranging in the order of 10?6 to 0.01. As δ increased, γ usually increased but sometimes decreased. How the data were analysed had the dominating effects on γ, overshadowing the effect of δ. The results suggest that it is impossible to predict γ using δ and that it may not be appropriate to use the γ-based RE to determine sufficient m.
文摘In this study, we investigate the effects of missing data when estimating HIV/TB co-infection. We revisit the concept of missing data and examine three available approaches for dealing with missingness. The main objective is to identify the best method for correcting missing data in TB/HIV Co-infection setting. We employ both empirical data analysis and extensive simulation study to examine the effects of missing data, the accuracy, sensitivity, specificity and train and test error for different approaches. The novelty of this work hinges on the use of modern statistical learning algorithm when treating missingness. In the empirical analysis, both HIV data and TB-HIV co-infection data imputations were performed, and the missing values were imputed using different approaches. In the simulation study, sets of 0% (Complete case), 10%, 30%, 50% and 80% of the data were drawn randomly and replaced with missing values. Results show complete cases only had a co-infection rate (95% Confidence Interval band) of 29% (25%, 33%), weighted method 27% (23%, 31%), likelihood-based approach 26% (24%, 28%) and multiple imputation approach 21% (20%, 22%). In conclusion, MI remains the best approach for dealing with missing data and failure to apply it, results to overestimation of HIV/TB co-infection rate by 8%.
基金This study was supported by the National Climbing Project "Sequence Stratigraphy and Sea-level Eustasy on the Margins of the Chinese Palaeocontinent"(No.8502208)and the China Postdoctoral Foundation.
文摘Taken as an example the Upper Permian reef section on the carbonate platformmargin at Ziyun, Guizhou, the paper discusses the missing time of a hiatus surface, an impor-tant problem in chemical sequence stratigraphy, with the concept of cosmic chemistry. Then thepaper proposes a series of new concepts for chemical sequence stratigraphy, including the con-densation surface, relative compaction factor and time missing factor. Finally a quantitativecurve of Late Permian relative sea-level change in the Ziyun area is presented with timecoordinates.
基金This work is funded by Newton Institutional Links 2020-21 project:623718881,jointly by British Council and National Research Council of Thailand(www.britishcouncil.org).The corresponding author is the project PI.
文摘The problem of missing values has long been studied by researchers working in areas of data science and bioinformatics,especially the analysis of gene expression data that facilitates an early detection of cancer.Many attempts show improvements made by excluding samples with missing information from the analysis process,while others have tried to fill the gaps with possible values.While the former is simple,the latter safeguards information loss.For that,a neighbour-based(KNN)approach has proven more effective than other global estimators.The paper extends this further by introducing a new summarizationmethod to theKNNmodel.It is the first study that applies the concept of ordered weighted averaging(OWA)operator to such a problem context.In particular,two variations of OWA aggregation are proposed and evaluated against their baseline and other neighbor-based models.Using different ratios of missing values from 1%-20%and a set of six published gene expression datasets,the experimental results suggest that newmethods usually provide more accurate estimates than those compared methods.Specific to the missing rates of 5%and 20%,the best NRMSE scores as averages across datasets is 0.65 and 0.69,while the highest measures obtained by existing techniques included in this study are 0.80 and 0.84,respectively.