期刊文献+

Blocking optimized SIMD tree search on modern processors 被引量:2

Blocking optimized SIMD tree search on modern processors
下载PDF
导出
摘要 Tree search is a widely used fundamental algorithm. Modern processors provide tremendous computing power by integrating multiple cores, each with a vector processing unit. This paper reviews some studies on exploiting single instruction multiple date (SIMD) capacity of processors to improve the performance of tree search, and proposes several improvement methods on reported SIMD tree search algorithms. Based on blocking tree structure, blocking for memory alignment and dynamic blocking prefetch are proposed to optimize the overhead of memory access. Furthermore, as a way of non-linear loop unrolling, the search branch unwinding shows that the number of branches can exceed the data width of SIMD instructions in the SIMD search algorithm. The experiments suggest that blocking optimized SIMD tree search algorithm can achieve 1.6 times response speed faster than the un-optimized algorithm. Tree search is a widely used fundamental algorithm. Modern processors provide tremendous computing power by integrating multiple cores, each with a vector processing unit. This paper reviews some studies on exploiting single instruction multiple date (SIMD) capacity of processors to improve the performance of tree search, and proposes several improvement methods on reported SIMD tree search algorithms. Based on blocking tree structure, blocking for memory alignment and dynamic blocking prefetch are proposed to optimize the overhead of memory access. Furthermore, as a way of non-linear loop unrolling, the search branch unwinding shows that the number of branches can exceed the data width of SIMD instructions in the SIMD search algorithm. The experiments suggest that blocking optimized SIMD tree search algorithm can achieve 1.6 times response speed faster than the un-optimized algorithm.
出处 《Journal of Shanghai University(English Edition)》 CAS 2011年第5期437-444,共8页 上海大学学报(英文版)
基金 Project supported by the Shanghai Leading Academic Discipline Project(Grant No.J50103) the Graduate Student Innovation Foundation of Shanghai University(Grant No.SHUCX112167)
关键词 single instruction multiple date (SIMD) tree search binary search streaming SIMD extensions (SSE) Cell broadband engine (BE) single instruction multiple date (SIMD), tree search, binary search, streaming SIMD extensions (SSE), Cell broadband engine (BE)
  • 相关文献

参考文献13

  • 1ElM C, CHHUGANI J, SATISH N, SEDLAR E, NGUYEN A D, KALDEWAY T, LEE V W, BRANDT S A, DUBEY P. FAST: Fast architecture sensitive tree search on modern CPUs and GPUs [C]//The 2010 International Conference of SIMOD, Indianapolis, USA. 2010:339 450.
  • 2杨毅,王禹桥.中文分词词典机制:次字拼音首字母哈希机制[J].计算机工程与设计,2010,31(6):1369-1371. 被引量:2
  • 3KNUTH D E, The art of computer programming, volume III: Sorting and searching {M]. Baston: Addison- Wesley, 1973.
  • 4SCHLEGEL B, GEMULLA R, LEHNER W. k-Ary search on modern processors [C]// Proceedings of the 5th International Workshop on Data Management on New Hardware. Providence. Rhode Island. 2009: 52-60.
  • 5LIN Hai-bo, XIE Hai-bo, SHAO Ling, WANG Yuan-hong. Cell BE processor programming guide [M]. Beijing: Publishing House of Electronics Industry, 2008 (in Chinese).
  • 6GEDIK B, BORDAWEKAR R R, YU P S. Cellsort: High performance sorting on the Cell processor [C]// Proceedings of the 33rd International Conference on Very Large Date Bases, Vienna, Austria. 2009: 52-60.
  • 7Ross K A. Efficient hash probes on modern processors [C]// IEEE the 23rd International Conference on Data Engineering, Istanbul, Turkey. 2007: 1297-1301.
  • 8KIM C, SEDLAR E, CHttUGANI J,KALDEWAY T. NGUYEN A, DIBLAS A, LEE V, SATISH N, DUBEY P. Sort vs. hash revisited: Fast join implementation on multi-core CPUs [J]. Proceedings of the VLDB Endowment, 2009, 2(2): 1378 1389.
  • 9ZHou J, Ross K A. Implementing databasse operations using sired instructions [C]// Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data, New York, USA. 2002:145- 156.
  • 10KALDEWEY T, HAGEN J, BLAS A D, NEDLAR E. Parabel search on video cards [C]// The First USENIX Workshop on Hot Topics in Parallelism, Berkeley, CA. 2009.

二级参考文献9

共引文献1

同被引文献8

引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部