摘要
时间序列数据库中的惊奇模式发现是一个重要问题。已有的算法根据时间序列的形态特征定义并发现惊奇模式,而忽视时间序列内在的机理及其统计规律。为克服此缺点,提出基于时间序列预测的惊奇模式定义,即,其中包含了足够多例外的事件,并提出系统化的惊奇模式发现算法。首先将时间序列离散化为0和1组成的字串;然后用一个简单的算法从此字串中发现所有的惊奇模式。实验表明,所提算法不仅可以发现Keogh等人定义的惊奇模式,而且避免了发现无意义的惊奇模式。
Aim. Previous methods for finding surprising patterns in time series data suffer, in our opinion, three shortcomings: (1) they used very limited shape features of the time series data, (2) they ignored the statistical features of the time series data, and (3) they did not realize that utilizing suitable models can reduce the number of subsequences that have surprising patterns. We now present what we believe to be a better method. In the full paper, we explain our method in detail. In this abstract, we just add some pertinent remarks to the two topics of explanation: (1) the formal description of surprising pattern, (2) the algorithm for finding surprising patterns. In the first topic, we give a theorem and its proof and also five definitions. The three subtopics of the second topic are : the algorithm proposed by us (subtopic 2.1), the determination of the threshold values (subtopic 2.2), and the analysis of the computing complexity of the proposed algorithm (subtopic 2.3). In the second topic, we give a five-step flowchart, based on the theorem in the first topic, for finding surprising patterns. Most importantly, in subtopic 2.1, we explain the suitable modeling that reduces the number of subsequences that have surprising patterns. The algorithm achieves a rate of data compression about 32 : 1 or 64.1 ; so, it can be used in massive time series databases. The experimental results, given in a figure in the full paper, demonstrate preliminarily that the proposed method can not only find surprising patterns defined by Keogh et al but also omit those surprising patterns in the time series data that are not really surprising through suitable modeling.
出处
《西北工业大学学报》
EI
CAS
CSCD
北大核心
2007年第3期425-428,共4页
Journal of Northwestern Polytechnical University
基金
国家自然科学基金(60573096)资助