期刊文献+

基于异方差高斯过程的时间序列数据离群点检测 被引量:10

Outlier detection in time series data based on heteroscedastic Gaussian processes
下载PDF
导出
摘要 时间序列数据在测量过程中通常受到事物内在可变性以及外界干扰等因素的影响,针对各个时间点上数据受影响程度不同的情况,提出一种基于高斯过程预估模型的时间序列数据离群点检测方法。将监测数据分解为标准值和偏差项两个部分,除了对理想情况下的标准值建模,还再次使用高斯过程实现对异方差偏差项的有效描述,通过变分推断解决引入偏差项后的后验概率求解问题,将后验分布中设定的容差区间用于离群点判定。使用雅虎公司公开的网络流量时序数据进行验证,模型输出的容差区间在不同时间点上的变化趋势与标注的正常数据偏差情况相符,并在对比实验中异常检测性能指标F1-score优于自回归积分滑动平均模型、一类支持向量机以及基于密度并伴随噪声的空间聚类算法。实验结果表明,该模型能够有效描述各个时间点上正常数据的分布情况,取得误报率和召回率两方面的综合权衡,而且可以避免模型参数设置不当导致的性能问题。 Generally, there are inevitable disturbances in time series data, such as inherent uncertainties and external interferences. To detect outlier in time series data with time-varying disturbances, an approach based on prediction model using Gaussian Processes was proposed. The monitoring data was decomposed into two components: the standard value and the deviation term. As the basis of model for the ideal standard value without any deviation, Gaussian processes were also employed to model the heteroscedastic deviations. The posterior distribution of predicted data which is analytically intractable after introducing deviation term was approximated by variational inference. The tolerance interval selected from posterior distribution was used for outlier detection. Verification experiments were conducted on the public time series datasets of network traffic from Yahoo. The calculated tolerance interval coincided with the actual range of reasonable deviation existing in labeled normal data at various time points. In the comparison experiments, the proposed model outperformed autoregressive integrated moving average model, one-class support vector machine and Density-Based Spatial Clustering of Application with Noise( DBSCAN) in terms of F1-score. The experimental results show that the proposed model can effectively describe the distribution of normal data at various time points, achieve a tradeoff between false alarm rate and recall, and avoid the performance problems caused by improper parameter settings.
作者 严宏 杨波 杨红雨 YAN Hong;YANG Bo;YANG Hongyu(College of Computer Science,Civil Aviation Flight University of China,Guanghan Sichuan 618307,China;National Key Laboratory of Air Traffic Control Automation System Technology,Sichuan University,Chengdu Sichuan 610064,China)
出处 《计算机应用》 CSCD 北大核心 2018年第5期1346-1352,共7页 journal of Computer Applications
基金 国家空管科研资助项目(GKG201403004)~~
关键词 离群点检测 时间序列 高斯过程 异方差 变分推断 outlier detection time series Gaussian process heteroscedasticity variational inference
  • 相关文献

参考文献2

二级参考文献27

  • 1贺昱曜,闫茂德,陈天琴.功率变换器中的混沌及控制方法[J].长安大学学报(自然科学版),2005,25(6):94-99. 被引量:2
  • 2王妍,徐伟.基于时间序列的相空间重构算法及验证(二)[J].山东大学学报(工学版),2005,35(6):89-94. 被引量:8
  • 3张剑英,程健,侯玉华,白静宜,裴小斐.煤矿瓦斯浓度预测的ANFIS方法研究[J].中国矿业大学学报,2007,36(4):494-498. 被引量:34
  • 4薛安荣,鞠时光,何伟华,陈伟鹤.局部离群点挖掘算法研究[J].计算机学报,2007,30(8):1455-1463. 被引量:96
  • 5Kim H S, Eykholt R, Salas J D.Nonlinear dynamics,delay times and embedding windows[J].Physica D, 1999,127:48-60.
  • 6Packard N H, Crutchfield J P, Farmer J D, et al.Geometry from a times series[J].Phys Rev Lett, 1980,45 : 712-716.
  • 7Takens F.Determing strange attractors in turbulence[J].Lecture Notes in Math, 1981,898:361-381.
  • 8Brock W A,Dechert W D,Scheinkman J A,et al.A test for independence based on the correlation dimension[J].Econ Rev, 1996,15(3) : 197-235.
  • 9YANG Q,WU X.10 challenging problems in data mining research [J].International Journal of Information Technology and Decision Making,2006,5(4):597-604.
  • 10HAN J,KAMBER M,PEI J.Data mining:concepts and tech-niques [M].Waltham:Morgan Kaufmann Publishers,2006.

共引文献48

同被引文献124

引证文献10

二级引证文献67

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部