期刊文献+

MapReduce并行加速数据流多模式相似性搜索 被引量:5

Accelerating parallel searching similar multiple patterns from data streams by using MapReduce
下载PDF
导出
摘要 设计时间序列数据在Hadoop分布式文件系统(HDFS)中的有效存储方式,利用分布式缓存工具Distributed Cache将各子序列分发到Hadoop集群的计算节点上,将动态时间弯曲距离矩阵划分成多个子矩阵,采取并行迭代计算每条反对角线上子矩阵的方法,基于MapReduce编程模型,实现高效并行计算时间序列动态弯曲距离,通过改进剪裁冗余计算方法,设计实现一种数据流多模式相似性搜索并行算法。中国雪深长时间序列数据集的实验结果表明,当每条时间序列的长度达到5 000以上时,并行计算动态弯曲距离所需时间少于串行计算所需时间,当每条时间序列的长度达到9 000以上时,参与计算的集群节点越多,并行计算所需时间越少;当模式长度达到4 000、参与计算的集群节点数达5个以上时,从数据流中并行搜索出与模式匹配的相似子序列所需时间约为串行搜索所需时间的20%。 The effective storage mode for time series was designed on Hadoop Distributed File System ( HDFS), the sub- series were distributed to the compute nodes on Hadoop cluster by applying Distributed Cache tool, and the matrix of dynamic time warping distances was partitioned into several sub-matrixes. Based on MapReduce programming mode, by parallel computing sub-matrixes in each back-diagonal iteratively, the parallel computation of dynamic time warping distances was implemented, and an efficient parallel algorithm for searching similar patterns from data streams was developed by improving pruning redundant computation. The experimental results on the data set of snow depth long time series in China show that when the length of each time series is equal to or longer than 5 000, the required time of parallel computing dynamic time warping distances is less than that of the corresponding sequential computation, and when the length of each time series is equal to or longer than 9000, the more the compute nodes used, the less the required parallel computation time; furthermore, when the length of each pattern is equal to or longer than 4000 and the number of compute nodes is equal to or larger than 5, the required time of parallel searching similar sub-series from data streams is 20% of the corresponding sequential searching time.
出处 《计算机应用》 CSCD 北大核心 2017年第1期37-41,53,共6页 journal of Computer Applications
基金 广西自然科学基金资助项目(2014GXNSFAA118396)~~
关键词 时间序列 数据流 动态时间弯曲距离 模式搜索 HADOOP time series data stream dynamic time warping distance pattern searching Hadoop
  • 相关文献

参考文献4

二级参考文献33

  • 1陈国良.并行算法的可扩放性分析[J].小型微型计算机系统,1995,16(2):10-16. 被引量:12
  • 2Park JH,George KM.Efficient parallel hardware algorithms for string matching.Microprocessors and Microsystems,1999,23(3):155-168.
  • 3Lester B.The Art of Parallel Programming.Englewood Cliffs:Prentice Hall,1993.
  • 4Alan AB,Mei A.A residue number system on reconfigurable mesh with applications to prefix sums and approximate string matching.IEEE Trans.on Parallel and Distributed Systems,2000,11(11):1186-1199.
  • 5Pan Y,Li Y,Li J,LI K,Zheng SQ.Efficient parallel algorithms for distance maps of 2-D binary images using an optical bus.IEEE Trans.on Systems,Man,and Cybernetics-Part A:Systems and Humans,2002,32(2):228-236.
  • 6Han Y,Pan Y,Shen H.Sublogarithmic deterministic selection on arrays with a reconfigurable optical bus.IEEE Trans.on Computers,2002,51(6):702-706.
  • 7Navarro G.A guided tour to approximate string matching.ACM Computing Surveys,2001,33(1):31-88.
  • 8Navarro G,Baeza-Yates R.A hybrid indexing method for approximate string matching.Journal of Discrete Algorithms,2000,1(1):21-49.
  • 9Lee H-C,Ercal F.RMESH algorithms for parallel string matching.In:Proc.of the 3rd Int'l.Symp.on Parallel Architectures,Algorithms,and Networks(I-SPAN'97).Los Alamitos:IEEE Computer Society Press,1997.223~226.http://ieeexplore.ieee.org/ xpl/tocresult.jsp?i
  • 10Jiang Y,Wright AH.O(k)parallel algorithms for approximate string matching.Journal of Neural Parallel and Scientific Computation,1993,1:443-452.

共引文献37

同被引文献61

引证文献5

二级引证文献9

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部