摘要
目的用模拟研究的方法,对含周期性的时间序列数据中随机型缺失数据进行填补,比较基于周期信息的时间序列缺失值填补法(简称周期性填补法)和spline插值填补法对缺失数据的填补效果。方法利用SAS模拟产生平稳、有周期性的时间序列数据并构造随机型缺失。分别比较相同序列长度不同缺失比例和相同缺失比例不同序列长度下,两种方法的缺失值填补效果。采用NRMSE和RMSE量化填补的误差。结果相同序列长度下,随着缺失比例的增加,两种填补方法的填补误差均增加,除缺失比例为30%的RMSE在两种方法间的差异无统计学意义外,周期性填补法的NRMSE和RMSE均小于spline填补法(P<0.05)。相同缺失比例下,序列长度较短时,两种填补方法的差异无统计学意义;序列长度较长时,周期性填补法的填补效果优于spline填补法。结论总体上,周期性填补法对含有确切周期性的时间序列中缺失数据的填补效果较好。
Objective The aim was to compare the imputed effects for missing values between imputation method based on periodicity and the cubic spline curve method using simulating time-series data with periodicity. Methods To produce stable and periodic time-series within random missing values, and compare the imputation effect of two methods under the condition of the same sequence length, different missing percent- age and the same missing percentage, different sequence length. The NRMSE (Normalized Root Mean Square error) and RMSE (Root Mean Square Error) is used to estimate the imputing effectiveness. Results Under the same sequence length, the imputation error of two methods in- creased according to the missing percentage. Except for RMSE in 30%, the NRMSE and RMSE was smaller than those of the spline method ( P 〈 0. 05 ). Under the same missing percentage, the difference of error for two im- putation methods was no statistical significance when the sequence is short, while the effectiveness of imputation method based on periodicity was better than that of the spline method when the sequence is long. Conclusion In general, the effectiveness of imputation method based on periodicity is bet- ter than that of the spline method in time series with significant periodicity.
出处
《中国卫生统计》
CSCD
北大核心
2012年第4期475-477,共3页
Chinese Journal of Health Statistics
基金
2008年国家自然科学基金资助(30872182)