发现时间序列数据中的高质量惊奇模式

Finding Good-Quality Surprising Patterns in Time Series Data

下载PDF

导出

摘要时间序列数据库中的惊奇模式发现是一个重要问题。已有的算法根据时间序列的形态特征定义并发现惊奇模式,而忽视时间序列内在的机理及其统计规律。为克服此缺点,提出基于时间序列预测的惊奇模式定义,即,其中包含了足够多例外的事件,并提出系统化的惊奇模式发现算法。首先将时间序列离散化为0和1组成的字串;然后用一个简单的算法从此字串中发现所有的惊奇模式。实验表明,所提算法不仅可以发现Keogh等人定义的惊奇模式,而且避免了发现无意义的惊奇模式。 Aim. Previous methods for finding surprising patterns in time series data suffer, in our opinion, three shortcomings：（1） they used very limited shape features of the time series data, （2） they ignored the statistical features of the time series data, and （3） they did not realize that utilizing suitable models can reduce the number of subsequences that have surprising patterns. We now present what we believe to be a better method. In the full paper, we explain our method in detail. In this abstract, we just add some pertinent remarks to the two topics of explanation：（1） the formal description of surprising pattern, （2） the algorithm for finding surprising patterns. In the first topic, we give a theorem and its proof and also five definitions. The three subtopics of the second topic are ： the algorithm proposed by us （subtopic 2.1）, the determination of the threshold values （subtopic 2.2）, and the analysis of the computing complexity of the proposed algorithm （subtopic 2.3）. In the second topic, we give a five-step flowchart, based on the theorem in the first topic, for finding surprising patterns. Most importantly, in subtopic 2.1, we explain the suitable modeling that reduces the number of subsequences that have surprising patterns. The algorithm achieves a rate of data compression about 32 ： 1 or 64.1 ; so, it can be used in massive time series databases. The experimental results, given in a figure in the full paper, demonstrate preliminarily that the proposed method can not only find surprising patterns defined by Keogh et al but also omit those surprising patterns in the time series data that are not really surprising through suitable modeling.

作者李爱国李战怀

机构地区西北工业大学计算机科学与工程学院

出处《西北工业大学学报》 EI CAS CSCD 北大核心 2007年第3期425-428,共4页 Journal of Northwestern Polytechnical University

基金国家自然科学基金(60573096)资助

关键词数据挖掘时间序列惊奇模式知识获取 time series data, surprising pattern, modeling

分类号 TP311.13 [自动化与计算机技术—计算机软件与理论]

引文网络
相关文献

参考文献4

1Keogh E,Lonardi S,Chiu B.Finding Surprising Patterns in a Time Series Database in Linear Time and Space.Proc of SIGKDD.Edmonton,Alberta,Canada,2002
2Shahabi C,Tian X,Zhao W.Tsa-Tree:A Wavelet-Based Approach to Improve the Efficiency of Multi-Level Surprise and Trend Query.Proc of 12th International Conference on Scientific and Statistical Database Management,Berlin,Germany,2000,56-68
3Chakrabarti S,Sarawagi S,Dom B.Mining Surprising Patterns Using Temporal Description Length.Proc of the 24th VLDB,New York,USA,1998
4李爱国,覃征.自适应局部线性化法预测混沌时间序列[J].系统工程理论与实践,2004,24(6):67-71. 被引量：9

二级参考文献9

1[1]Farmer J D, Sidorowich J J. Predicting chaotic time series [J]. Phys Rev Lett, 1987,59: 845-848.
2[2]Jayawardena A W, Li W K, Xu P. Neighbourhood selection for local modelling and prediction of hydrological time series[J]. Journal of Hydrology, 2002, 258: 40-57.
3[3]Kugiumtzis D. State space reconstruction parameters in the analysis of chaotic time series - the role of the time window length[J]. Physica D, 1996, 95: 13-28.
4[4]Reick C H, Page B. Time series prediction by multivariate next neighbor methods with application to zooplankton forecasts[J]. Mathematics and Computers in Simulation, 2000, 52: 289-310.
5[5]Kantz H, Schreiber T. Nonlinear Time Series Analysis[M]. Cambridge University Press, 1997 (清华大学出版社,2000,影印本).
6[7]Kugiumtzis D, Ling O C, Christophersen N. Regularized local linear prediction of chaotic time series[J]. Physica D, 1998, 112:344-360.
7[9]程云鹏.矩阵论(第2版)[M]. 西安: 西北工业大学出版社, 2002. 227-228.
8孙海云,曹庆杰.混沌时间序列建模及预测[J].系统工程理论与实践,2001,21(5):106-109. 被引量：21
9沈辉,胡德文.基于正交最小二乘估计的非线性时间序列的预测[J].国防科技大学学报,2001,23(2):115-118. 被引量：5

共引文献8

1汪斌,周辉.基于相空间优化近邻点的跳频预测方法研究[J].装备指挥技术学院学报,2007,18(2):85-88. 被引量：2
2王永生,范洪达,尚崇伟,刘振.混沌时间序列的神经网络预测研究[J].海军航空工程学院学报,2008,23(1):21-25. 被引量：10
3汪金菊,朱功勤,傅建伟,曹天祥,饶卫星.变分贝叶斯Kriging模型预测混沌时间序列[J].合肥工业大学学报（自然科学版）,2009,32(1):131-135. 被引量：1
4李爱国,康莉.自适应局部线性化方法的余震间隔时间预测[J].西安科技大学学报,2010,30(4):441-446.
5韩静,刘光萍.混沌相空间重构预测技术在铀矿堆浸中的应用[J].金属矿山,2012,41(10):33-35. 被引量：1
6钱志强.基于正则化回归的混沌时间序列建模与预测[J].长春教育学院学报,2012,28(6):34-35.
7张金良,谭忠富.混沌时间序列的混合预测方法[J].系统工程理论与实践,2013,33(3):763-769. 被引量：15
8王东.基于自适应局部线性化的环境时间序列数据预测[J].广西职业技术学院学报,2018,11(3):14-17.

1舒坚,余坤,刘琳岚,董海星,谌友仁.无线传感器网络中基于移动模型的栅栏覆盖研究[J].计算机研究与发展,2011,48(S2):141-144. 被引量：4
2问答[J].电脑迷,2012,0(10S):90-93.
3小胖网[J].计算机应用文摘,2006(25):116-116.
4钟延峰.用小波神经网络对油井传感器进行故障诊断[J].油气田地面工程,2011,30(6):58-59.
5余静,麦绍辉,刘立东.电力系统EMS数据备份方案[J].电力系统自动化,2009,33(17):101-104. 被引量：7
6马海明,王玉洁.银行信息系统作业调度自动化建设[J].中国金融电脑,2013(4):49-50.
7王芳.揭秘“纵横”网络交易的三大骗术[J].科技创业月刊（财富版）,2008(8):62-62.
8丁思博,高岭,王力.结构化P2P中基于蜜罐的蠕虫发现策略研究[J].东南大学学报（自然科学版）,2008,38(A01):100-103.
9汤春蕾,董家麒.基于LSH的时间子序列查询算法[J].计算机学报,2012,35(11):2228-2236. 被引量：6
10熊莉娟,宋宏艳,张惠敏,尚海燕,原军,王波.时间序列数据库在配电网自动化系统中的应用[J].信息与电脑（理论版）,2013,0(7):176-178.

西北工业大学学报

2007年第3期

浏览历史

内容加载中请稍等...

发现时间序列数据中的高质量惊奇模式

参考文献4

二级参考文献9

共引文献8

相关作者

相关机构

相关主题

浏览历史