The similarity search is one of the fundamental components in time series data mining,e.g.clustering,classification,association rules mining.Many methods have been proposed to measure the similarity between time serie...The similarity search is one of the fundamental components in time series data mining,e.g.clustering,classification,association rules mining.Many methods have been proposed to measure the similarity between time series,including Euclidean distance,Manhattan distance,and dynamic time warping(DTW).In contrast,DTW has been suggested to allow more robust similarity measure and be able to find the optimal alignment in time series.However,due to its quadratic time and space complexity,DTW is not suitable for large time series datasets.Many improving algorithms have been proposed for DTW search in large databases,such as approximate search or exact indexed search.Unlike the previous modified algorithm,this paper presents a novel parallel scheme for fast similarity search based on DTW,which is called MRDTW(MapRedcuebased DTW).The experimental results show that our approach not only retained the original accuracy as DTW,but also greatly improved the efficiency of similarity measure in large time series.展开更多
The detection of outliers and change points from time series has become research focus in the area of time series data mining since it can be used for fraud detection, rare event discovery, event/trend change detectio...The detection of outliers and change points from time series has become research focus in the area of time series data mining since it can be used for fraud detection, rare event discovery, event/trend change detection, etc. In most previous works, outlier detection and change point detection have not been related explicitly and the change point detections did not consider the influence of outliers, in this work, a unified detection framework was presented to deal with both of them. The framework is based on ALARCON-AQUINO and BARRIA's change points detection method and adopts two-stage detection to divide the outliers and change points. The advantages of it lie in that: firstly, unified structure for change detection and outlier detection further reduces the computational complexity and make the detective procedure simple; Secondly, the detection strategy of outlier detection before change point detection avoids the influence of outliers to the change point detection, and thus improves the accuracy of the change point detection. The simulation experiments of the proposed method for both model data and actual application data have been made and gotten 100% detection accuracy. The comparisons between traditional detection method and the proposed method further demonstrate that the unified detection structure is more accurate when the time series are contaminated by outliers.展开更多
Pattern discovery from time series is of fundamental importance. Most of the algorithms of pattern discovery in time series capture the values of time series based on some kinds of similarity measures. Affected by the...Pattern discovery from time series is of fundamental importance. Most of the algorithms of pattern discovery in time series capture the values of time series based on some kinds of similarity measures. Affected by the scale and baseline, value-based methods bring about problem when the objective is to capture the shape. Thus, a similarity measure based on shape, Sh measure, is originally proposed, andthe properties of this similarity and corresponding proofs are given. Then a time series shape pattern discovery algorithm based on Sh measure is put forward. The proposed algorithm is terminated in finite iteration with given computational and storage complexity. Finally the experiments on synthetic datasets and sunspot datasets demonstrate that the time series shape pattern algorithm is valid.展开更多
A fundamental problem in whole sequence matching and subsequence matching is the problem of representation of time series.In the last decade many high level representations of time series have been proposed for data m...A fundamental problem in whole sequence matching and subsequence matching is the problem of representation of time series.In the last decade many high level representations of time series have been proposed for data mining which involve a trade-off between accuracy and compactness.In this paper the author proposes a novel time series representation called Grid Minimum Bounding Rectangle(GMBR) and based on Minimum Bounding Rectangle.In this paper,the binary idea is applied into the Minimum Bounding Rectangle.The experiments have been performed on synthetic,as well as real data sequences to evaluate the proposed method.The experiment demonstrates that 69%-92% of irrelevant sequences are pruned using the proposed method.展开更多
基金supported in part by National High-tech R&D Program of China under Grants No.2012AA012600,2011AA010702,2012AA01A401,2012AA01A402National Natural Science Foundation of China under Grant No.60933005+1 种基金National Science and Technology Ministry of China under Grant No.2012BAH38B04National 242 Information Security of China under Grant No.2011A010
文摘The similarity search is one of the fundamental components in time series data mining,e.g.clustering,classification,association rules mining.Many methods have been proposed to measure the similarity between time series,including Euclidean distance,Manhattan distance,and dynamic time warping(DTW).In contrast,DTW has been suggested to allow more robust similarity measure and be able to find the optimal alignment in time series.However,due to its quadratic time and space complexity,DTW is not suitable for large time series datasets.Many improving algorithms have been proposed for DTW search in large databases,such as approximate search or exact indexed search.Unlike the previous modified algorithm,this paper presents a novel parallel scheme for fast similarity search based on DTW,which is called MRDTW(MapRedcuebased DTW).The experimental results show that our approach not only retained the original accuracy as DTW,but also greatly improved the efficiency of similarity measure in large time series.
基金Project(2011AA040603) supported by the National High Technology Ressarch & Development Program of ChinaProject(201202226) supported by the Natural Science Foundation of Liaoning Province, China
文摘The detection of outliers and change points from time series has become research focus in the area of time series data mining since it can be used for fraud detection, rare event discovery, event/trend change detection, etc. In most previous works, outlier detection and change point detection have not been related explicitly and the change point detections did not consider the influence of outliers, in this work, a unified detection framework was presented to deal with both of them. The framework is based on ALARCON-AQUINO and BARRIA's change points detection method and adopts two-stage detection to divide the outliers and change points. The advantages of it lie in that: firstly, unified structure for change detection and outlier detection further reduces the computational complexity and make the detective procedure simple; Secondly, the detection strategy of outlier detection before change point detection avoids the influence of outliers to the change point detection, and thus improves the accuracy of the change point detection. The simulation experiments of the proposed method for both model data and actual application data have been made and gotten 100% detection accuracy. The comparisons between traditional detection method and the proposed method further demonstrate that the unified detection structure is more accurate when the time series are contaminated by outliers.
文摘Pattern discovery from time series is of fundamental importance. Most of the algorithms of pattern discovery in time series capture the values of time series based on some kinds of similarity measures. Affected by the scale and baseline, value-based methods bring about problem when the objective is to capture the shape. Thus, a similarity measure based on shape, Sh measure, is originally proposed, andthe properties of this similarity and corresponding proofs are given. Then a time series shape pattern discovery algorithm based on Sh measure is put forward. The proposed algorithm is terminated in finite iteration with given computational and storage complexity. Finally the experiments on synthetic datasets and sunspot datasets demonstrate that the time series shape pattern algorithm is valid.
基金National Natural Science Foundation of China (No.60674088)Shandong Education Committee 2007 Scientific Research Development Plan (No.J07WJ20)
文摘A fundamental problem in whole sequence matching and subsequence matching is the problem of representation of time series.In the last decade many high level representations of time series have been proposed for data mining which involve a trade-off between accuracy and compactness.In this paper the author proposes a novel time series representation called Grid Minimum Bounding Rectangle(GMBR) and based on Minimum Bounding Rectangle.In this paper,the binary idea is applied into the Minimum Bounding Rectangle.The experiments have been performed on synthetic,as well as real data sequences to evaluate the proposed method.The experiment demonstrates that 69%-92% of irrelevant sequences are pruned using the proposed method.