摘要
提出一种面向高速乱序流的top-k连续查询方法。使用基于缓存的方法等待迟到元组,但不对缓冲区内数据进行排序,通过统计运行信息实现缓存时长自适应,然后使用改造的MinTopk算法计算当前窗口的top-k结果集。实验结果表明,该方法在高速乱序流上实现了高效的top-k查询,在保证用户允许的最小正确率的情况下计算出最小缓存时长,减少了查询时延。
The continuous top-k query approach over high-speed out-of-order data streams was proposed.Using a cache-based method to wait for late tuples without sorting the data in the buffer,the self-adaptive cache duration was realized by counting the running information.And the modified MinTopk algorithm was used to calculate the top-k result set of the current window.The experimental results showed that this approach could achieve efficient top-k query over high-speed out-of-order data streams.In case of ensuring the minimum accuracy allowed by users,the minimum cache duration was calculated to reduce the query delay.
作者
武守晓
房俊
WU Shouxiao;FANG Jun(Beijing Key Laboratory on Integration and Analysis of Large-scale Stream Data,North China University of Technology,Beijing 100144,China;Institute of Data Engineering,North China University of Technology,Beijing 100144,China)
出处
《郑州大学学报(理学版)》
北大核心
2021年第3期93-99,共7页
Journal of Zhengzhou University:Natural Science Edition
基金
国家重点研发计划项目(2017YFC0804406)
国家自然科学基金项目(61672042)。
关键词
高速乱序流
top-k连续查询
缓存时长自适应
查询时延
high-speed out-of-order data stream
continuous top-k query
self-adaptive cache duration
query latency