摘要
提出了一种基于嵌套滑动窗口的缺失数据填充算法.考虑到传感器数据流的时效特性,采用嵌套滑动窗口选取空间相关度高且距离最近的数据作为样本数据,之后分两种情况对缺失数据进行填充.算法首先通过皮尔逊相关计算对数据的空间性进行分析,应用嵌套滑动窗口对缺失数据相关的数据进行采样,得到强相关数据,之后采用MKNN算法进行精确填充.通过皮尔逊相关分析和嵌套窗口采样,极大地降低了数据样本大小,提高了缺失数据处理实时性;对于不具有强的空间相关的缺失数据,考虑到短时间内采集数据间强的时间相关性,采用线性相关法对数据进行填充,降低算法复杂度.实验表明,该算法能够实时、精确地对数据流缺失数据进行填充.
Characteristics of continuous,massive and rapid make the traditional imputation algorithm can not be applied to data stream.In this paper,a nested sliding window-based missing data imputing algorithm has been proposed.Taking into account the aging characteristics of the data stream of sensor networks,we use a nested sliding window to select the data,both of which have high spatial correlation and nearest data,as sample data,then to impute the missing data by two cases.Firstly,we use the Pearson correlation to analysis the spatial relation of data,then use nested sliding window to select the sample data which have strong spatial relation to each others,then use MKNN algorithm to accurate impute.Pearson correlation analysis and nested window greatly reduced the data size greatly,improved the real-time processing;For missing data which do not having strong spatial correlation,using simple linear correlation algorithm to impute to reduce the complexity.Experimental results show that this algorithm can accurately to impute the missing data of data flow in real time.
出处
《西南师范大学学报(自然科学版)》
CAS
北大核心
2015年第11期130-136,共7页
Journal of Southwest China Normal University(Natural Science Edition)
关键词
传感器网络
数据流
嵌套滑动窗口
缺失数据
数据填充
sensor networks
data flow
the nested sliding window
missing data
data imputation