Outlier detection is a very useful technique in many applications, where data is generally uncertain and could be described using probability. While having been studied intensively in the field of deterministic data, ...Outlier detection is a very useful technique in many applications, where data is generally uncertain and could be described using probability. While having been studied intensively in the field of deterministic data, outlier detection is still novel in the emerging uncertain data field. In this paper, we study the semantic of outlier detection on probabilistic data stream and present a new definition of distance-based outlier over sliding window. We then show the problem of detecting an outlier over a set of possible world instances is equivalent to the problem of finding the k-th element in its neighborhood. Based on this observation, a dynamic programming algorithm (DPA) is proposed to reduce the detection cost from 0(2IR(~'d)l) to O(Ik.R(e, d)l), where R(e, d) is the d-neighborhood of e. Furthermore, we propose a pruning-based approach (PBA) to effectively and efficiently filter non-outliers on single window, and dynamically detect recent m elements incrementally. Finally, detailed analysis and thorough experimental results demonstrate the efficiency and scalability of our approach.展开更多
Sliding-window multi-stream join (SWMJ) is a fundamental operation for correlating information from dif- ferent streams. We provide a solution to the problem of as- sessing significance of the SWMJ result by focusin...Sliding-window multi-stream join (SWMJ) is a fundamental operation for correlating information from dif- ferent streams. We provide a solution to the problem of as- sessing significance of the SWMJ result by focusing on the relative frequency of windows satisfying a given equijoin predicate as the most important parameter of the SWMJ re- suit. In particular, we derive a formula for computing the expected relative frequency of windows satisfying a given equijoin predicate that can be. evaluated in quadratic time in the window size given a proposed probabilistic model of the multi-stream. In experiments conducted on a daily rain- fall data set we demonstrate the remarkable accuracy of our method, which confirms our theoretical analysis.展开更多
基金supported by the National Natural Science Foundation of China under Grant Nos. 60973020, 60828004,and 60933001the Program for New Century Excellent Talents in University of China under Grant No. NCET-06-0290the Fundamental Research Funds for the Central Universities under Grant No. N090504004
文摘Outlier detection is a very useful technique in many applications, where data is generally uncertain and could be described using probability. While having been studied intensively in the field of deterministic data, outlier detection is still novel in the emerging uncertain data field. In this paper, we study the semantic of outlier detection on probabilistic data stream and present a new definition of distance-based outlier over sliding window. We then show the problem of detecting an outlier over a set of possible world instances is equivalent to the problem of finding the k-th element in its neighborhood. Based on this observation, a dynamic programming algorithm (DPA) is proposed to reduce the detection cost from 0(2IR(~'d)l) to O(Ik.R(e, d)l), where R(e, d) is the d-neighborhood of e. Furthermore, we propose a pruning-based approach (PBA) to effectively and efficiently filter non-outliers on single window, and dynamically detect recent m elements incrementally. Finally, detailed analysis and thorough experimental results demonstrate the efficiency and scalability of our approach.
文摘Sliding-window multi-stream join (SWMJ) is a fundamental operation for correlating information from dif- ferent streams. We provide a solution to the problem of as- sessing significance of the SWMJ result by focusing on the relative frequency of windows satisfying a given equijoin predicate as the most important parameter of the SWMJ re- suit. In particular, we derive a formula for computing the expected relative frequency of windows satisfying a given equijoin predicate that can be. evaluated in quadratic time in the window size given a proposed probabilistic model of the multi-stream. In experiments conducted on a daily rain- fall data set we demonstrate the remarkable accuracy of our method, which confirms our theoretical analysis.