期刊文献+

用于时态聚合范围查询的分布式时态索引

Distributed temporal index for temporal aggregation range query
下载PDF
导出
摘要 在大数据与云计算时代,时态大数据的查询分析面临许多重要挑战。针对其中时态聚合范围查询性能不佳和不能有效利用索引等问题,提出一种用于时态聚合范围查询的分布式时态索引(DTI)。首先,采用随机或轮询策略对时态数据分区;其次,采用基于时间位数组前缀的分区内索引构造算法建立索引,同时记录包括时间跨度在内的分区统计信息;再次,利用谓词下推筛选时间跨度与查询时间区间重叠的数据分区,扫描索引进行预聚合;最后,将各分区得到的预聚合值按时间归并并聚合。实验结果表明,索引的分区内构造算法处理时间密度2400条每单位时间和0.001条每单位时间的数据的执行时间相近。索引的聚合查询算法相较于ParTime算法:在查询时间线前75%的数据时,每一步用时都至少减少22%;执行选择型聚合函数时,每一步用时都至少减少11%。因此,索引在多数时态聚合范围查询任务中具有更高的速度,它的分区内构造算法能解决数据稀疏问题且执行效率高。 In the era of big data and cloud computing,querying and analyzing temporal big data faces many important challenges.Focused on the issues such as poor query performance and ineffective utilization of indexes for temporal aggregation range query,a Distributed Temporal Index(DTI)for temporal aggregation range query was proposed.Firstly,random or round-robin strategy was used to partition the temporal data.Secondly,intra-partition index construction algorithm based on timestamp’s bit array prefix was used to build intra-partition index,and partition statistics including time span were recorded.Thirdly,the data partitions whose time span overlapped with the query time interval were selected by predicate pushdown operation,and were pre-aggregated by index scan.Finally,all pre-aggregated values obtained from each partition were merged and aggregated by time.The experimental results show that the execution time of intra-partition index construction algorithm of the index for processing data with density of 2400 entries per unit of time is similar to the execution time for processing data with density of 0.001 entries per unit of time.Compared to ParTime,the temporal aggregation range query algorithm with index takes at least 22%less time for each step when querying the data in the first 75%of timeline and at least 11%less time for each step when executing selective aggregation.Therefore,the algorithm with index is faster in most temporal aggregate range query tasks and its intra-partition index construction algorithm is capable to solve data sparsity problem with high efficiency.
作者 孟繁珺 韩斌 黄树成 梅向东 MENG Fanjun;HAN Bin;HUANG Shucheng;MEI Xiangdong(School of Computer,Jiangsu University of Science and Technology,Zhenjiang Jiangsu 212100,China;Xsuperzone Technology Company Limited,Changzhou Jiangsu 213022,China)
出处 《计算机应用》 CSCD 北大核心 2024年第6期1848-1854,共7页 journal of Computer Applications
基金 深海技术科学太湖实验室船舶总体性能创新研究开放基金资助项目(25422217)。
关键词 时态索引 时态数据 分布式 时态聚合 计数排序 temporal index temporal data distributed temporal aggregation counting sort
  • 相关文献

参考文献4

二级参考文献15

共引文献6

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部