摘要
针对气体传感器数据采集过程中可能出现数据失真、数据重复的现象,提出一种基于时间序列滑动窗口的异常检测方法。基于滑动窗口将原始时间序列分割成多个子序列,利用斜率的置信区间距离半径提取子序列时序特征并识别疑似异常序列,再通过时间序列分解与基于密度的噪声应用空间聚类方法(Density-based Spatial Clustering of Applications with Noise,DBSCAN)进一步判定异常值。以某区域挥发性有机物(Volatile Organic Compounds,VOCs)数据作为验证数据集,检测结果表明该算法能够准确识别异常子序列和异常值,精确率、查全率以及平衡F分数(F_(1))分别为93.7%、90.7%和92.18%,验证了提出方法的可用性。同时,针对异常为缺失值的情况,提出了一种基于支持向量机回归(Support Vector Regression,SVR)的恢复模型,经验证决定系数R^(2)为96.53%,优于对比模型。
Environmental monitoring systems may not collect accurate pollutant concentration data from sensor networks due to system failures and other reasons.This study proposes corresponding detection and processing methods for three common outliers in the acquisition process.In the case of distorted data and duplicate data,an anomaly detection method based on the characteristics of time series is proposed.The method was divided into two stages.The first stage divides the original time series into multiple sub-series using a Sliding Window model.The sub-series features based on the radius of the confidence interval distance of the window slope are extracted to identify suspected anomalous sequences.In the second stage,the time series of the current window is decomposed based on the Seasonal and Trend decomposition using Loess(STL) method,and the serial residuals are obtained after removing the periodic term and trend term from the original series.Then based on the cluster analysis(DBSCAN),the points that can not be classified as a certain cluster are identified as outliers,and finally,the outlier information is output.We take the Volatile Organic Compounds(VOCs) data of a region as the validation dataset.Testing results show that the algorithm can accurately identify abnormal subsequences and outliers.Precision,Recall,and F1-score of 93.7%,90.7%,and 92.18% verify the usability of the proposed method.For the missing data,there is a recovery model based on Support Vector Regression(SVR) proposed.At first,the input eigenvalues are dimensionalized using Principal Component Analysis(PCA).It uses Particle Swarm Optimization(PSO) algorithm to find the optimal parameters,which overcomes the problem that the detection results are not accurate enough due to artificially set parameters.It tests the validation set based on the recovery model and compares it with ARIMA and PSO-SVR algorithms.The results show that the Mean Square Error(MSE),Mean Absolute Error(MAE),and Coefficient of Determination(R^(2)) of the proposed model are better than the comparison model.
作者
陆秋琴
王璐
黄光球
LU Qiuqin;WANG Lu;HUANG Guangqiu(School of Management,Xi'an University of Architecture and Technology,Xi'an 710055,China)
出处
《安全与环境学报》
CAS
CSCD
北大核心
2023年第12期4590-4599,共10页
Journal of Safety and Environment
基金
国家自然科学基金项目(71874134)。