How to process aggregate queries over data streams efficiently and effectively have been becoming hot re search topics in both academic community and industrial community. Aiming at the issues, a novel Linked-tree alg...How to process aggregate queries over data streams efficiently and effectively have been becoming hot re search topics in both academic community and industrial community. Aiming at the issues, a novel Linked-tree algorithm based on sliding window is proposed in this paper. Due to the proposal of concept area, the Linked-tree algorithm reuses many primary results in last window and then avoids lots of unnecessary repeated comparison operations between two successive windows. As a result, execution efficiency of MAX query is improved dramatically. In addition, since the size of memory is relevant to the number of areas but irrelevant to the size of sliding window, memory is economized greatly. The extensive experimental results show that the performance of Linked-tree algorithm has significant improvement gains over the traditional SC (Simple Compared) algorithm and Ranked-tree algorithm.展开更多
Sliding-window multi-stream join (SWMJ) is a fundamental operation for correlating information from dif- ferent streams. We provide a solution to the problem of as- sessing significance of the SWMJ result by focusin...Sliding-window multi-stream join (SWMJ) is a fundamental operation for correlating information from dif- ferent streams. We provide a solution to the problem of as- sessing significance of the SWMJ result by focusing on the relative frequency of windows satisfying a given equijoin predicate as the most important parameter of the SWMJ re- suit. In particular, we derive a formula for computing the expected relative frequency of windows satisfying a given equijoin predicate that can be. evaluated in quadratic time in the window size given a proposed probabilistic model of the multi-stream. In experiments conducted on a daily rain- fall data set we demonstrate the remarkable accuracy of our method, which confirms our theoretical analysis.展开更多
It is essential to provide responses to queries within time deadlines,even if not exact and complete.To reduce the query latency,systems usually partition large-scale data computations as a series of tasks over many p...It is essential to provide responses to queries within time deadlines,even if not exact and complete.To reduce the query latency,systems usually partition large-scale data computations as a series of tasks over many processes and aggregate them to reduce the response time by using aggregation trees.An obstacle is that the involved processes of a query usually differ in their speeds,thus not all processes can complete their tasks in time.This would directly degrade the response quality(the number of outputs received by the root of an aggregation tree).In this paper,we propose a general aggregation tree model,Tarot,to maximize the response quality by systematically addressing the following challenging issues:(1)fine-grained partition of the query deadline along the multi-level aggregation tree;(2)learning the distribution of durations at each level in the aggregation tree to optimize the wait durations at aggregators;(3)adaptively reassigning tasks over processes according to their status;(4)performing periodic aggregation of received outputs from the low level to avoid missing the deadline.The prior model does not consider the four aspects simultaneously.Extensive evaluations indicate that Tarot can adapt to multi-level trees and considerably improve the response quality compared to prior work while guaranteeing the query deadline.展开更多
基金Supported by the National Natural Science Foun-dation of China (60573089) the National 985 Project Fundation(985-2-DB-Y01)
文摘How to process aggregate queries over data streams efficiently and effectively have been becoming hot re search topics in both academic community and industrial community. Aiming at the issues, a novel Linked-tree algorithm based on sliding window is proposed in this paper. Due to the proposal of concept area, the Linked-tree algorithm reuses many primary results in last window and then avoids lots of unnecessary repeated comparison operations between two successive windows. As a result, execution efficiency of MAX query is improved dramatically. In addition, since the size of memory is relevant to the number of areas but irrelevant to the size of sliding window, memory is economized greatly. The extensive experimental results show that the performance of Linked-tree algorithm has significant improvement gains over the traditional SC (Simple Compared) algorithm and Ranked-tree algorithm.
文摘Sliding-window multi-stream join (SWMJ) is a fundamental operation for correlating information from dif- ferent streams. We provide a solution to the problem of as- sessing significance of the SWMJ result by focusing on the relative frequency of windows satisfying a given equijoin predicate as the most important parameter of the SWMJ re- suit. In particular, we derive a formula for computing the expected relative frequency of windows satisfying a given equijoin predicate that can be. evaluated in quadratic time in the window size given a proposed probabilistic model of the multi-stream. In experiments conducted on a daily rain- fall data set we demonstrate the remarkable accuracy of our method, which confirms our theoretical analysis.
基金supported by the National Natural Science Foundation of China(Grant No.61772544)National Basic Research Program(973 program)(2014CB347800)+1 种基金the Hunan Provincial Natural Science Fund for Distinguished Young Scholars(2016JJ1002)the Guangxi Cooperative Innovation Center of Cloud Computing and Big Data(YD16507 and YD17X11).
文摘It is essential to provide responses to queries within time deadlines,even if not exact and complete.To reduce the query latency,systems usually partition large-scale data computations as a series of tasks over many processes and aggregate them to reduce the response time by using aggregation trees.An obstacle is that the involved processes of a query usually differ in their speeds,thus not all processes can complete their tasks in time.This would directly degrade the response quality(the number of outputs received by the root of an aggregation tree).In this paper,we propose a general aggregation tree model,Tarot,to maximize the response quality by systematically addressing the following challenging issues:(1)fine-grained partition of the query deadline along the multi-level aggregation tree;(2)learning the distribution of durations at each level in the aggregation tree to optimize the wait durations at aggregators;(3)adaptively reassigning tasks over processes according to their status;(4)performing periodic aggregation of received outputs from the low level to avoid missing the deadline.The prior model does not consider the four aspects simultaneously.Extensive evaluations indicate that Tarot can adapt to multi-level trees and considerably improve the response quality compared to prior work while guaranteeing the query deadline.