The estimation of covariance matrices is very important in many fields, such as statistics. In real applications, data are frequently influenced by high dimensions and noise. However, most relevant studies are based o...The estimation of covariance matrices is very important in many fields, such as statistics. In real applications, data are frequently influenced by high dimensions and noise. However, most relevant studies are based on complete data. This paper studies the optimal estimation of high-dimensional covariance matrices based on missing and noisy sample under the norm. First, the model with sub-Gaussian additive noise is presented. The generalized sample covariance is then modified to define a hard thresholding estimator , and the minimax upper bound is derived. After that, the minimax lower bound is derived, and it is concluded that the estimator presented in this article is rate-optimal. Finally, numerical simulation analysis is performed. The result shows that for missing samples with sub-Gaussian noise, if the true covariance matrix is sparse, the hard thresholding estimator outperforms the traditional estimate method.展开更多
Missing data presents a significant challenge in statistical analysis and machine learning, often resulting in biased outcomes and diminished efficiency. This comprehensive review investigates various imputation techn...Missing data presents a significant challenge in statistical analysis and machine learning, often resulting in biased outcomes and diminished efficiency. This comprehensive review investigates various imputation techniques, categorizing them into three primary approaches: deterministic methods, probabilistic models, and machine learning algorithms. Traditional techniques, including mean or mode imputation, regression imputation, and last observation carried forward, are evaluated alongside more contemporary methods such as multiple imputation, expectation-maximization, and deep learning strategies. The strengths and limitations of each approach are outlined. Key considerations for selecting appropriate methods, based on data characteristics and research objectives, are discussed. The importance of evaluating imputation’s impact on subsequent analyses is emphasized. This synthesis of recent advancements and best practices provides researchers with a robust framework for effectively handling missing data, thereby improving the reliability of empirical findings across diverse disciplines.展开更多
In this paper, a model averaging method is proposed for varying-coefficient models with response missing at random by establishing a weight selection criterion based on cross-validation. Under certain regularity condi...In this paper, a model averaging method is proposed for varying-coefficient models with response missing at random by establishing a weight selection criterion based on cross-validation. Under certain regularity conditions, it is proved that the proposed method is asymptotically optimal in the sense of achieving the minimum squared error.展开更多
The first thunderstorm weather appeared in southern Shenyang on May 2,2010 and did not bring about severe lightning disaster for Shenyang region,but forecast service had poor effect without forecasting thunderstorm we...The first thunderstorm weather appeared in southern Shenyang on May 2,2010 and did not bring about severe lightning disaster for Shenyang region,but forecast service had poor effect without forecasting thunderstorm weather accurately.In our paper,the reasons for missing report of this thunderstorm weather were analyzed,and analysis on thunderstorm potential was carried out by means of mesoscale analysis technique,providing technical index and vantage point for the prediction of thunderstorm potential.The results showed that the reasons for missing report of this weather process were as follows:surface temperature at prophase was constantly lower going against the development of convective weather;the interpreting and analyzing ability of numerical forecast product should be improved;the forecast result of T639 model was better than that of Japanese numerical forecast;the study and application of mesoscale analysis technique should be strengthened,and this service was formally developed after thunderstorm weather on June 1,2010.展开更多
文摘The estimation of covariance matrices is very important in many fields, such as statistics. In real applications, data are frequently influenced by high dimensions and noise. However, most relevant studies are based on complete data. This paper studies the optimal estimation of high-dimensional covariance matrices based on missing and noisy sample under the norm. First, the model with sub-Gaussian additive noise is presented. The generalized sample covariance is then modified to define a hard thresholding estimator , and the minimax upper bound is derived. After that, the minimax lower bound is derived, and it is concluded that the estimator presented in this article is rate-optimal. Finally, numerical simulation analysis is performed. The result shows that for missing samples with sub-Gaussian noise, if the true covariance matrix is sparse, the hard thresholding estimator outperforms the traditional estimate method.
文摘Missing data presents a significant challenge in statistical analysis and machine learning, often resulting in biased outcomes and diminished efficiency. This comprehensive review investigates various imputation techniques, categorizing them into three primary approaches: deterministic methods, probabilistic models, and machine learning algorithms. Traditional techniques, including mean or mode imputation, regression imputation, and last observation carried forward, are evaluated alongside more contemporary methods such as multiple imputation, expectation-maximization, and deep learning strategies. The strengths and limitations of each approach are outlined. Key considerations for selecting appropriate methods, based on data characteristics and research objectives, are discussed. The importance of evaluating imputation’s impact on subsequent analyses is emphasized. This synthesis of recent advancements and best practices provides researchers with a robust framework for effectively handling missing data, thereby improving the reliability of empirical findings across diverse disciplines.
文摘In this paper, a model averaging method is proposed for varying-coefficient models with response missing at random by establishing a weight selection criterion based on cross-validation. Under certain regularity conditions, it is proved that the proposed method is asymptotically optimal in the sense of achieving the minimum squared error.
文摘The first thunderstorm weather appeared in southern Shenyang on May 2,2010 and did not bring about severe lightning disaster for Shenyang region,but forecast service had poor effect without forecasting thunderstorm weather accurately.In our paper,the reasons for missing report of this thunderstorm weather were analyzed,and analysis on thunderstorm potential was carried out by means of mesoscale analysis technique,providing technical index and vantage point for the prediction of thunderstorm potential.The results showed that the reasons for missing report of this weather process were as follows:surface temperature at prophase was constantly lower going against the development of convective weather;the interpreting and analyzing ability of numerical forecast product should be improved;the forecast result of T639 model was better than that of Japanese numerical forecast;the study and application of mesoscale analysis technique should be strengthened,and this service was formally developed after thunderstorm weather on June 1,2010.