The estimation of covariance matrices is very important in many fields, such as statistics. In real applications, data are frequently influenced by high dimensions and noise. However, most relevant studies are based o...The estimation of covariance matrices is very important in many fields, such as statistics. In real applications, data are frequently influenced by high dimensions and noise. However, most relevant studies are based on complete data. This paper studies the optimal estimation of high-dimensional covariance matrices based on missing and noisy sample under the norm. First, the model with sub-Gaussian additive noise is presented. The generalized sample covariance is then modified to define a hard thresholding estimator , and the minimax upper bound is derived. After that, the minimax lower bound is derived, and it is concluded that the estimator presented in this article is rate-optimal. Finally, numerical simulation analysis is performed. The result shows that for missing samples with sub-Gaussian noise, if the true covariance matrix is sparse, the hard thresholding estimator outperforms the traditional estimate method.展开更多
The frequent missing values in radar-derived time-series tracks of aerial targets(RTT-AT)lead to significant challenges in subsequent data-driven tasks.However,the majority of imputation research focuses on random mis...The frequent missing values in radar-derived time-series tracks of aerial targets(RTT-AT)lead to significant challenges in subsequent data-driven tasks.However,the majority of imputation research focuses on random missing(RM)that differs significantly from common missing patterns of RTT-AT.The method for solving the RM may experience performance degradation or failure when applied to RTT-AT imputation.Conventional autoregressive deep learning methods are prone to error accumulation and long-term dependency loss.In this paper,a non-autoregressive imputation model that addresses the issue of missing value imputation for two common missing patterns in RTT-AT is proposed.Our model consists of two probabilistic sparse diagonal masking self-attention(PSDMSA)units and a weight fusion unit.It learns missing values by combining the representations outputted by the two units,aiming to minimize the difference between the missing values and their actual values.The PSDMSA units effectively capture temporal dependencies and attribute correlations between time steps,improving imputation quality.The weight fusion unit automatically updates the weights of the output representations from the two units to obtain a more accurate final representation.The experimental results indicate that,despite varying missing rates in the two missing patterns,our model consistently outperforms other methods in imputation performance and exhibits a low frequency of deviations in estimates for specific missing entries.Compared to the state-of-the-art autoregressive deep learning imputation model Bidirectional Recurrent Imputation for Time Series(BRITS),our proposed model reduces mean absolute error(MAE)by 31%~50%.Additionally,the model attains a training speed that is 4 to 8 times faster when compared to both BRITS and a standard Transformer model when trained on the same dataset.Finally,the findings from the ablation experiments demonstrate that the PSDMSA,the weight fusion unit,cascade network design,and imputation loss enhance imputation performance and confirm the efficacy of our design.展开更多
In this paper, a model averaging method is proposed for varying-coefficient models with response missing at random by establishing a weight selection criterion based on cross-validation. Under certain regularity condi...In this paper, a model averaging method is proposed for varying-coefficient models with response missing at random by establishing a weight selection criterion based on cross-validation. Under certain regularity conditions, it is proved that the proposed method is asymptotically optimal in the sense of achieving the minimum squared error.展开更多
Background:Missing data are frequently occurred in clinical studies.Due to the development of precision medicine,there is an increased interest in N-of-1 trial.Bayesian models are one of main statistical methods for a...Background:Missing data are frequently occurred in clinical studies.Due to the development of precision medicine,there is an increased interest in N-of-1 trial.Bayesian models are one of main statistical methods for analyzing the data of N-of-1 trials.This simulation study aimed to compare two statistical methods for handling missing values of quantitative data in Bayesian N-of-1 trials.Methods:The simulated data of N-of-1 trials with different coefficients of autocorrelation,effect sizes and missing ratios are obtained by SAS 9.1 system.The missing values are filled with mean filling and regression filling respectively in the condition of different coefficients of autocorrelation,effect sizes and missing ratios by SPSS 25.0 software.Bayesian models are built to estimate the posterior means by Winbugs 14 software.Results:When the missing ratio is relatively small,e.g.5%,missing values have relatively little effect on the results.Therapeutic effects may be underestimated when the coefficient of autocorrelation increases and no filling is used.However,it may be overestimated when mean or regression filling is used,and the results after mean filling are closer to the actual effect than regression filling.In the case of moderate missing ratio,the estimated effect after mean filling is closer to the actual effect compared to regression filling.When a large missing ratio(20%)occurs,data missing can lead to significantly underestimate the effect.In this case,the estimated effect after regression filling is closer to the actual effect compared to mean filling.Conclusion:Data missing can affect the estimated therapeutic effects using Bayesian models in N-of-1 trials.The present study suggests that mean filling can be used under situation of missing ratio≤10%.Otherwise,regression filling may be preferable.展开更多
文摘The estimation of covariance matrices is very important in many fields, such as statistics. In real applications, data are frequently influenced by high dimensions and noise. However, most relevant studies are based on complete data. This paper studies the optimal estimation of high-dimensional covariance matrices based on missing and noisy sample under the norm. First, the model with sub-Gaussian additive noise is presented. The generalized sample covariance is then modified to define a hard thresholding estimator , and the minimax upper bound is derived. After that, the minimax lower bound is derived, and it is concluded that the estimator presented in this article is rate-optimal. Finally, numerical simulation analysis is performed. The result shows that for missing samples with sub-Gaussian noise, if the true covariance matrix is sparse, the hard thresholding estimator outperforms the traditional estimate method.
基金supported by Graduate Funded Project(No.JY2022A017).
文摘The frequent missing values in radar-derived time-series tracks of aerial targets(RTT-AT)lead to significant challenges in subsequent data-driven tasks.However,the majority of imputation research focuses on random missing(RM)that differs significantly from common missing patterns of RTT-AT.The method for solving the RM may experience performance degradation or failure when applied to RTT-AT imputation.Conventional autoregressive deep learning methods are prone to error accumulation and long-term dependency loss.In this paper,a non-autoregressive imputation model that addresses the issue of missing value imputation for two common missing patterns in RTT-AT is proposed.Our model consists of two probabilistic sparse diagonal masking self-attention(PSDMSA)units and a weight fusion unit.It learns missing values by combining the representations outputted by the two units,aiming to minimize the difference between the missing values and their actual values.The PSDMSA units effectively capture temporal dependencies and attribute correlations between time steps,improving imputation quality.The weight fusion unit automatically updates the weights of the output representations from the two units to obtain a more accurate final representation.The experimental results indicate that,despite varying missing rates in the two missing patterns,our model consistently outperforms other methods in imputation performance and exhibits a low frequency of deviations in estimates for specific missing entries.Compared to the state-of-the-art autoregressive deep learning imputation model Bidirectional Recurrent Imputation for Time Series(BRITS),our proposed model reduces mean absolute error(MAE)by 31%~50%.Additionally,the model attains a training speed that is 4 to 8 times faster when compared to both BRITS and a standard Transformer model when trained on the same dataset.Finally,the findings from the ablation experiments demonstrate that the PSDMSA,the weight fusion unit,cascade network design,and imputation loss enhance imputation performance and confirm the efficacy of our design.
文摘In this paper, a model averaging method is proposed for varying-coefficient models with response missing at random by establishing a weight selection criterion based on cross-validation. Under certain regularity conditions, it is proved that the proposed method is asymptotically optimal in the sense of achieving the minimum squared error.
基金supported by the National Natural Science Foundation of China (No.81973705).
文摘Background:Missing data are frequently occurred in clinical studies.Due to the development of precision medicine,there is an increased interest in N-of-1 trial.Bayesian models are one of main statistical methods for analyzing the data of N-of-1 trials.This simulation study aimed to compare two statistical methods for handling missing values of quantitative data in Bayesian N-of-1 trials.Methods:The simulated data of N-of-1 trials with different coefficients of autocorrelation,effect sizes and missing ratios are obtained by SAS 9.1 system.The missing values are filled with mean filling and regression filling respectively in the condition of different coefficients of autocorrelation,effect sizes and missing ratios by SPSS 25.0 software.Bayesian models are built to estimate the posterior means by Winbugs 14 software.Results:When the missing ratio is relatively small,e.g.5%,missing values have relatively little effect on the results.Therapeutic effects may be underestimated when the coefficient of autocorrelation increases and no filling is used.However,it may be overestimated when mean or regression filling is used,and the results after mean filling are closer to the actual effect than regression filling.In the case of moderate missing ratio,the estimated effect after mean filling is closer to the actual effect compared to regression filling.When a large missing ratio(20%)occurs,data missing can lead to significantly underestimate the effect.In this case,the estimated effect after regression filling is closer to the actual effect compared to mean filling.Conclusion:Data missing can affect the estimated therapeutic effects using Bayesian models in N-of-1 trials.The present study suggests that mean filling can be used under situation of missing ratio≤10%.Otherwise,regression filling may be preferable.