In this paper, we study the skyline group problem over a data stream. An object can dominate another object if it is not worse than the other object on all attributes and is better than the other object on at least on...In this paper, we study the skyline group problem over a data stream. An object can dominate another object if it is not worse than the other object on all attributes and is better than the other object on at least one attribute. If an object cannot be dominated by any other object, it is a skyline object. The skyline group problem involves finding k-item groups that cannot be dominated by any other k-item group. Existing algorithms designed to find skyline groups can only process static data. However, data changes as a stream with time in many applications,and algorithms should be designed to support skyline group queries on dynamic data. In this paper, we propose new algorithms to find skyline groups over a data stream. We use data structures, namely a hash table, dominance graph, and matrix, to store dominance information and update results incrementally. We conduct experiments on synthetic datasets to evaluate the performance of the proposed algorithms. The experimental results show that our algorithms can efficiently find skyline groups over a data stream.展开更多
The data stream processing framework processes the stream data based on event-time to ensure that the request can be responded to in real-time.In reality,streaming data usually arrives out-of-order due to factors such...The data stream processing framework processes the stream data based on event-time to ensure that the request can be responded to in real-time.In reality,streaming data usually arrives out-of-order due to factors such as network delay.The data stream processing framework commonly adopts the watermark mechanism to address the data disorderedness.Watermark is a special kind of data inserted into the data stream with a timestamp,which helps the framework to decide whether the data received is late and thus be discarded.Traditional watermark generation strategies are periodic;they cannot dynamically adjust the watermark distribution to balance the responsiveness and accuracy.This paper proposes an adaptive watermark generation mechanism based on the time series prediction model to address the above limitation.This mechanism dynamically adjusts the frequency and timing of watermark distribution using the disordered data ratio and other lateness properties of the data stream to improve the system responsiveness while ensuring acceptable result accuracy.We implement the proposed mechanism on top of Flink and evaluate it with realworld datasets.The experiment results show that our mechanism is superior to the existing watermark distribution strategies in terms of both system responsiveness and result accuracy.展开更多
基金supported by the Fundamental Research Funds for the Central Universities (Nos. FRF-TP-14025A1 and FRF-TP-15-025A2)supported by the Key Technologies Research and Development Program of 12th Five-Year Plan of China (No.2013BAI13B06)
文摘In this paper, we study the skyline group problem over a data stream. An object can dominate another object if it is not worse than the other object on all attributes and is better than the other object on at least one attribute. If an object cannot be dominated by any other object, it is a skyline object. The skyline group problem involves finding k-item groups that cannot be dominated by any other k-item group. Existing algorithms designed to find skyline groups can only process static data. However, data changes as a stream with time in many applications,and algorithms should be designed to support skyline group queries on dynamic data. In this paper, we propose new algorithms to find skyline groups over a data stream. We use data structures, namely a hash table, dominance graph, and matrix, to store dominance information and update results incrementally. We conduct experiments on synthetic datasets to evaluate the performance of the proposed algorithms. The experimental results show that our algorithms can efficiently find skyline groups over a data stream.
基金This work was supported by National Key Research and Development Program of China(2020YFB1506703)the National Natural Science Foundation of China(Grant No.62072018).
文摘The data stream processing framework processes the stream data based on event-time to ensure that the request can be responded to in real-time.In reality,streaming data usually arrives out-of-order due to factors such as network delay.The data stream processing framework commonly adopts the watermark mechanism to address the data disorderedness.Watermark is a special kind of data inserted into the data stream with a timestamp,which helps the framework to decide whether the data received is late and thus be discarded.Traditional watermark generation strategies are periodic;they cannot dynamically adjust the watermark distribution to balance the responsiveness and accuracy.This paper proposes an adaptive watermark generation mechanism based on the time series prediction model to address the above limitation.This mechanism dynamically adjusts the frequency and timing of watermark distribution using the disordered data ratio and other lateness properties of the data stream to improve the system responsiveness while ensuring acceptable result accuracy.We implement the proposed mechanism on top of Flink and evaluate it with realworld datasets.The experiment results show that our mechanism is superior to the existing watermark distribution strategies in terms of both system responsiveness and result accuracy.