摘要
在有序的数据流时间序列当中发现异常序列是网络入侵、灾害监测、故障检测、股市分析、医疗诊断等领域当中的重点工作。为此,针对基于相似度、基于偏差以及基于密度的序列异常检测方法存在的检测精度低、效率差的问题,研究序列模式匹配在大数据流频繁序列异常检测中的应用。采用基于线性回归模型和残差分析方法对大数据流进行聚类,根据聚类结果结合分段法提取大数据流频繁序列特征,通过计算序列特征之间的距离实现序列模式匹配对比,实现异常序列检测。最后通过仿真实验测试序列模式匹配在大数据流频繁序列异常检测中的应用效果。实验结果表明,与基于相似度、基于偏差以及基于密度的三种序列异常检测方法相比,在序列模式匹配方法应用下,检测精度提高了2.4%,7.3%,10.1%,检测时间减少了4.1 s,11.7 s,11.4 s。所提序列模式匹配方法更有利于完成序列异常检测。
It is found that the abnormal sequence in ordered data stream time series is the key work in network intrusion,disaster monitoring,fault detection,stock market analysis,medical diagnosis,etc.For this reason,in view of the problems of low detection accuracy and inefficiency of sequence anomaly detection methods based on similarity,deviation and density respectively,the application of sequence pattern matching in big data stream frequent sequence anomaly detection is studied.The big data stream clustering based on linear regression model and residual analysis method is implemented.According to the clustering results,frequent sequence features of big data stream are extracted in combination with segmentation method.The distance between sequence features is calculated to realize the sequence pattern matching contrast,and then to realize the anomaly sequence detection.Simulation experiments were carried out to test the effect of application of sequence pattern matching in big data stream frequent sequence anomaly detection.The results show that the method is similar to the method based on similarity and deviation.In comparison with the three sequence anomaly detection methods based on similarity,deviation and density respectively,the sequence pattern matching method improves the detection accuracy by 2.4%,7.3%,10.1%and shorten the test duration by 4.1 s,11.7 s and 11.4 s.It can be seen that the proposed sequence pattern matching method is more conducive to the completion of sequence anomaly detection.
作者
段淼
粱杰
DUAN Miao;LIANG Jie(School of Electrical and Computer,Jilin Jianzhu University,Changchun 130118,China;Vocational Foundation Department,Changchun Polytechnic,Changchun 130000,China)
出处
《现代电子技术》
2021年第3期59-64,共6页
Modern Electronics Technique
基金
国家自然科学基金(61404069)资助
国家自然科学基金(61300230)资助。
关键词
序列模式匹配
大数据流
频繁序列
异常检测
大数据流聚类
特征提取
sequence pattern matching
big data stream
frequent sequence
anomaly detection
big data stream clustering
feature extraction