摘要
时间序列数据在测量过程中通常受到事物内在可变性以及外界干扰等因素的影响,针对各个时间点上数据受影响程度不同的情况,提出一种基于高斯过程预估模型的时间序列数据离群点检测方法。将监测数据分解为标准值和偏差项两个部分,除了对理想情况下的标准值建模,还再次使用高斯过程实现对异方差偏差项的有效描述,通过变分推断解决引入偏差项后的后验概率求解问题,将后验分布中设定的容差区间用于离群点判定。使用雅虎公司公开的网络流量时序数据进行验证,模型输出的容差区间在不同时间点上的变化趋势与标注的正常数据偏差情况相符,并在对比实验中异常检测性能指标F1-score优于自回归积分滑动平均模型、一类支持向量机以及基于密度并伴随噪声的空间聚类算法。实验结果表明,该模型能够有效描述各个时间点上正常数据的分布情况,取得误报率和召回率两方面的综合权衡,而且可以避免模型参数设置不当导致的性能问题。
Generally, there are inevitable disturbances in time series data, such as inherent uncertainties and external interferences. To detect outlier in time series data with time-varying disturbances, an approach based on prediction model using Gaussian Processes was proposed. The monitoring data was decomposed into two components: the standard value and the deviation term. As the basis of model for the ideal standard value without any deviation, Gaussian processes were also employed to model the heteroscedastic deviations. The posterior distribution of predicted data which is analytically intractable after introducing deviation term was approximated by variational inference. The tolerance interval selected from posterior distribution was used for outlier detection. Verification experiments were conducted on the public time series datasets of network traffic from Yahoo. The calculated tolerance interval coincided with the actual range of reasonable deviation existing in labeled normal data at various time points. In the comparison experiments, the proposed model outperformed autoregressive integrated moving average model, one-class support vector machine and Density-Based Spatial Clustering of Application with Noise( DBSCAN) in terms of F1-score. The experimental results show that the proposed model can effectively describe the distribution of normal data at various time points, achieve a tradeoff between false alarm rate and recall, and avoid the performance problems caused by improper parameter settings.
作者
严宏
杨波
杨红雨
YAN Hong;YANG Bo;YANG Hongyu(College of Computer Science,Civil Aviation Flight University of China,Guanghan Sichuan 618307,China;National Key Laboratory of Air Traffic Control Automation System Technology,Sichuan University,Chengdu Sichuan 610064,China)
出处
《计算机应用》
CSCD
北大核心
2018年第5期1346-1352,共7页
journal of Computer Applications
基金
国家空管科研资助项目(GKG201403004)~~
关键词
离群点检测
时间序列
高斯过程
异方差
变分推断
outlier detection
time series
Gaussian process
heteroscedasticity
variational inference