期刊文献+

含有周期性的时间序列中随机型缺失数据的填补方法 被引量:2

Imputation Methods for Missing Values in Time Series with Significant Periodicity
下载PDF
导出
摘要 目的用模拟研究的方法,对含周期性的时间序列数据中随机型缺失数据进行填补,比较基于周期信息的时间序列缺失值填补法(简称周期性填补法)和spline插值填补法对缺失数据的填补效果。方法利用SAS模拟产生平稳、有周期性的时间序列数据并构造随机型缺失。分别比较相同序列长度不同缺失比例和相同缺失比例不同序列长度下,两种方法的缺失值填补效果。采用NRMSE和RMSE量化填补的误差。结果相同序列长度下,随着缺失比例的增加,两种填补方法的填补误差均增加,除缺失比例为30%的RMSE在两种方法间的差异无统计学意义外,周期性填补法的NRMSE和RMSE均小于spline填补法(P<0.05)。相同缺失比例下,序列长度较短时,两种填补方法的差异无统计学意义;序列长度较长时,周期性填补法的填补效果优于spline填补法。结论总体上,周期性填补法对含有确切周期性的时间序列中缺失数据的填补效果较好。 Objective The aim was to compare the imputed effects for missing values between imputation method based on periodicity and the cubic spline curve method using simulating time-series data with periodicity. Methods To produce stable and periodic time-series within random missing values, and compare the imputation effect of two methods under the condition of the same sequence length, different missing percent- age and the same missing percentage, different sequence length. The NRMSE (Normalized Root Mean Square error) and RMSE (Root Mean Square Error) is used to estimate the imputing effectiveness. Results Under the same sequence length, the imputation error of two methods in- creased according to the missing percentage. Except for RMSE in 30%, the NRMSE and RMSE was smaller than those of the spline method ( P 〈 0. 05 ). Under the same missing percentage, the difference of error for two im- putation methods was no statistical significance when the sequence is short, while the effectiveness of imputation method based on periodicity was better than that of the spline method when the sequence is long. Conclusion In general, the effectiveness of imputation method based on periodicity is bet- ter than that of the spline method in time series with significant periodicity.
出处 《中国卫生统计》 CSCD 北大核心 2012年第4期475-477,共3页 Chinese Journal of Health Statistics
基金 2008年国家自然科学基金资助(30872182)
关键词 时间序列 缺失数据 周期性填补 spline填补法 Time series Missing value Imputation method based periodicity Spline method
  • 相关文献

参考文献6

  • 1方兆本,李红星,杨建萍.基于公开数据的SARS流行规律的建模及预报[J].数理统计与管理,2003,22(5):48-52. 被引量:8
  • 2Pascal Bondon. Infuence of missing values on the prediction of a station- ary time series. Journal of time series analysis ,2005,26 (4) :519-525.
  • 3Wayne F. Velicer, Suzanne M. Colby. A Comparison of Missing-Data Procedures for ARIMA Time-Series Analysis. Educational and Psycho- logical Measurement,2005,6 5:596-615.
  • 4Koji Mutekia, John F. MacGregora, Toshihiro Ueda K, et al. Estimation of missing data using latent variable methods with auxiliary information. Chemometrics and Intelligent Laboratory Systems ,2005,78:41-50.
  • 5Heikki Junninena, Harri Niskaa, Kari Tuppurainenc, et al. Methods for imputation of missing values in air quality data sets. Atmospheric Envi- ronment, 2004 ( 38 ) : 2895-290.
  • 6武艳强,黄立人.时间序列处理的新插值方法[J].大地测量与地球动力学,2004,24(4):43-47. 被引量:40

二级参考文献13

  • 1郭祖超.医用数理统计方法[M].北京:人民卫生出版社,1988..
  • 2JMEngland.医学研究中统计学与流行病学方法[M].北京:人民卫生出版社,1980..
  • 3Andrew B. Lawson. Statistical Methods in Spatial Epidemiology. [ R] UK JOHN WILEY & SONS, LTD, 2001.
  • 4Willam F. Lucas. Models in Applied Mathematics. National University of Defense Technology [ R], 1996.
  • 5Paul Elliott, Jon Wakefield, Nicola Best, David Briggs. Spatial Epidemiology: Methods and Applications[J], Released: 15 December, 2001 ISBN: 0198515324.
  • 6David G. Kleinbaum. Survival Analysis: A Self - Learning Text ( Statistics in the Health Sciences) [J],August, 1996, ISBN:0387945431.
  • 7Venables, W N and Ripley, B D Modem Applied statistics with S - PLUS. [ J ] Springer - Verlag New York, 1994.
  • 8Peter J. Brockwell, Richard A. Davis. Time series: Theory and Methods. [J] Springer,2001.
  • 9Ripley, B D Spatial Statistics, [J]Wiley New York, 1981.
  • 10Geary, R C . The Contiguity Ratio and Statistical Mapping. [ M] The Incorporated Statistician, 5, 115 -145,1954.

共引文献46

同被引文献11

引证文献2

二级引证文献5

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部