摘要
针对电能量数据缺失值处理技术,提出一种引入时间序列的基于贝叶斯常均值模型的数据增广算法(DA多重插补法).应用期望最大算法(EM插补算法)计算缺失值的插补值,将得到的插补值作为插补的初始值,然后根据电能量数据随时间变化的特点,构建基于常均值模型的多重插补模型,利用贝叶斯方法预测每个缺失值的多次插补值,综合分析观测误差方差和状态误差方差得到最终插补值,从而得到多个完整数据集合.在不同缺失率的条件下,通过与EM插补结果,以及与基于贝叶斯线性回归的DA多重插补结果相比较,得出改进的插补方法比所预测的误差更低,波动更小,插补结果更稳定的结论,有效提高电能量缺失数据的插补精度.
In this paper, a DA multiple interpolation method introducing time series based on Bayesian mean value model is proposed to handle with missing data in electric energy data. Firstly, the EM interpolation algorithm is used to calculate the interpolation value of missing value, and the obtained interpolation value is taken as the initial value of the interpolation. Secondly, according to the characteristics of electric energy date changing over time, a multiple imputation model on the base of mean value model is constructed. Then, the multiple interpolation values of each missing value will be predicted by Bayesian method. The final error is obtained by comprehensive analysis of the observed error variance and the state error variance, and a number of complete data sets are finally obtained.In the condition of different data loss rate, compared with other results of EM interpolation and the DA interpolation based on Bayesian linear regression, it is obviously concluded that the improved interpolation method applied in this paper takes full account of the time fluctuation characteristics of electric energy data and is more objective as well as practical, besides, its interpolation result is more scientific and reasonable.
出处
《广西科技大学学报》
2017年第3期103-109,共7页
Journal of Guangxi University of Science and Technology
基金
中国南方电网科技项目(GZHKJ00000024)资助
关键词
贝叶斯常均值模型
DA多重插补法
电能量数据缺失
Bayesian mean value model
DA multiple interpolation method
lack of electrical energy data