摘要
基于Spark Streaming计算框架下的分布式突发关键字查询是监测流数据中关键字突发时间的热点研究问题。多数研究方法存储统计所有的关键字,并未考虑热点关键字。在数据呈爆炸式增长的背景下,获取热点关键字的突发时间更具有价值。针对这个问题,提出一种分布式突发关键字查询算法,该算法采用动态的更新策略,通过设置检查点的方法提取热点关键字,并在线性的时间内查询突发的时间范围。实验结果表明,该算法的性能比现有算法更优。
Distributed bursty term query under the framework of Spark Streaming is a hot research issue. It aims to de- tect bursty terms in data streams. Most studies of bursty term query count and save all terms without consideration of hot terms. Under the background of exploding in the data scale, it makes more sense to get bursty time of hot terms. To solve this problem, we presented a distributed bursty term query algorithm. The algorithm uses dynamic update strategy and a checkpoint mechanism to extract hot terms. Also it finds the bursty time range in linear time Experimental results show that the proposed algorithm has better performance.
作者
郑诗敏
秦小麟
刘亮
周倩
ZHENG Shi-min QIN Xiao-lin LIU Liang ZHOU Qian(College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing 211106, China)
出处
《计算机科学》
CSCD
北大核心
2017年第3期10-15,35,共7页
Computer Science
基金
国家自然科学基金项目(61373015
61300052)
江苏高校优势学科建设工程资助项目(PAPD)
江苏省重大科技成果转化基金项目(BA2013049)资助