期刊文献+

面向非规则大数据分析应用的多核帮助线程预取方法 被引量:4

Multi-core helper thread prefetching for irregular data intensive applications
下载PDF
导出
摘要 大数据分析应用往往采用基于大型稀疏图的遍历算法,其主要特点是非规则数据密集访存。以频繁使用的具有大型稀疏图遍历特征的介度中心算法为例,提出一种基于帮助线程的多参数预取控制模型和参数优化方法,从而达到提高非规则数据密集程序性能的目的。在商用多核平台Q6600和I7上运用该方法后,介度中心算法在不同规模输入下平均性能加速比分别为1.20和1.11。实验结果表明,帮助线程预取能够有效提升该类非规则应用程序的性能。 Big data analysis applications often use sparse graph traversal algorithm which characterized by irregular data intensive memory access. For improving performance of memory access in sparse graph traversal algorithm, helper thread prefetching could convert discontinuous locality into continuous-instant spatial-temporal locality effectively by using the shared last level cache of chip multi-processor platforms. Betweenness centrality algorithm was used as a case study, the multi-parameter prefetching model of helper thread and optimized instances were presented and evaluated on commercial CMP platforms Q6600 and 17, the average speedup of betweenness centrality algorithm at different input scale is 1.20 and 1.11 respectively. The experiment results show that helper thread prefetching can improve the performance of irregular applications effectively.
出处 《通信学报》 EI CSCD 北大核心 2014年第8期137-146,共10页 Journal on Communications
基金 国家自然科学基金资助项目(61070029 61370062)~~
关键词 帮助线程预取 非规则数据密集应用 介度中心性 helper thread prefetching irregular data intensive applications betweenness centrality
  • 相关文献

参考文献19

  • 1SOFFER A, HEILD M. Big data meets social analytics[A]. Proc of 2012 Conference of IBM Lotusphere[C]. Orlando, Florida, USA, 2012.
  • 2BRANDES U. A faster algorithm for betweenness centrality[J]. Jour- nal of Mathematical Sociology, 2001, 25(2):163-177.
  • 3BADER D, MADDURI K, GILBERT J, et al. Designing scalable synthetic compact applications for benchmarking high productivity computing systems[J]. CTWatch Quarterly, 2006, 4B(2):1-10.
  • 4LUK C. Tolerating memory latency through soitware controlled pre-execution in simultaneous multithreading processors[A]. Proc of the 28th Annual International Symposium on Computer Architecture (ISCA)[C]. O6tehnrg, Sweden, 2001.40-51.
  • 5SMITH JE. Decoupled access/execute computer architectures[A]. Proc of the 9th International Symposium on Computer Architecture (ISCA)[C]. Austin, TX, USA, 1982. 112-119.
  • 6SONG Y, KALOGEROPULOS S, TIRUMALAI E Design and im- plementation of a compiler framework for helper threading on multi- core processors[A]. Proc of the 14th International Conference on Par- allel Architectures and Compilation Techniques (PACT)[C]. Saint Louis, MO, USA, 2005.99-109.
  • 7KIM D, LIAO S, WANG P, et al. Physical experimentation with pre- fetching helper threads on Inters hyper-threaded processors[A]. Proc of the International Symposium on Code generation and Optimization (CGO)[C]. Palo Alto, Calif, 2004.27-38.
  • 8LEE J, JUNG C, KIM D, et al. Prefetching with helper threads for lossely coupled multiprocessor systems[J]. 1EEE Transactions on Par- allel and Distributed System, 2009, 20(9): 1309-1324.
  • 9LU J, DAS A, HSU W, et al. Dynamic helper threaded prefetching on the Sun UltraSparc CMP processor[A]. Proc of 38th Annual IEEE/ACM International Symposium Micro Architecture (MI- CRO)[C]. Barcelona, Spain, 2005.93-104.
  • 10ZHOU J, CIESLEWICZ J, ROSS K, et aL Improving database per- formance on simultaneous multithreading processors[A]. Proc of the 31th International Conference on Very Large Data Bases[C]. Trond- helm, Norway, 2005.49-60.

二级参考文献15

  • 1del Sol A, Fujihashi H, O'Meara P. Topology of small-world networks of protein--Protein complex structures. Bioinformatics, 2005,21(8):1311-1315. [doi: 10.1093/bioinformaticsPoti167].
  • 2Liljeros F, Edling CR, Amaral LAN, Stanley HE, Aberg Y. The Web of human sexual contacts. Nature, 2001,411(6840):907-908. [doi: 10.1038/35082140].
  • 3Krebs VE. Mapping networks of terrorist cells. Connections, 2002,24(3):43-52.
  • 4Freeman LC. A set of measures of centrality based on betweenness. Sociomtry, 1977,40(1):35-41. [doi: 10.2307/3033543].
  • 5Newman ME J, Girvan M. Finding and evaluating community structure in networks. Physical Review E, 2004,69(2): 2611301-2611315. [doi: 10. I 103/PhysRevE.69.026113].
  • 6Crauser A, Mehlhorn K, Meyer U, Sanders P. A parallelization of dijkstras shortest path algorithm. Lecture Notes in Computer Science, 1998,1450:722-731. [doi: 10.1007/BFb0055823].
  • 7Grama AY, Kumar V. A survey of parallel search algorithms for discrete optimization problems. ORSA Journal on Computing, 1993,7. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=l 0.1.1.45.9937.
  • 8Klein PN, Subramanian S. A randomized parallel algorithm for single-source shortest paths. Journal of Algorithms, 1997,25(2): 205-220. [doi: 10.1006/jagm.1997.0888].
  • 9Bader DA, Madduri K. Parallel algorithms for evaluating centrality indices in real-world networks. In: Feng WC, ed. Proe. of the 35th Int'l Conf. on Parallel Processing (ICPP 2006). Washington: IEEE Computer Society, 2006. 539-550. [doi: 10.1109/ICPP. 2006.57].
  • 10Bader DA, Madduri K. Designing multithreaded algorithms for breadth-first search and st-connectivity on the cray rata-2. In: Feng WC, ed. Proc. of the 35th Int'l Conf. on Parallel Processing (ICPP 2006). Washington: IEEE Computer Society, 2006. 523-530. [doi: 10.1109/ICPP.2006.34].

共引文献6

同被引文献33

  • 1蔡曲林,刘普寅.一种新的概率神经网络有监督学习算法[J].模糊系统与数学,2006,20(6):83-87. 被引量:12
  • 2Chen T F and Bo:r J L. A performance study of software and hardware data prefetehing schemes[C]. Proceedings of 21st International Symposium on Computer Architecture, Chicago, USA, 1994: 223-232.
  • 3Saavedra R H and Daeyeon P. Improving the effectiveness of software prefetching with adaptive execution[C]. Proceedings of Conference on Parallel Architectures and Compilation Techniques, Boston, USA, 1996: 68-78.
  • 4Hut I and Lin C. Feedback mechanisms for improving probabilistic memory prefetching[C]. Proceedings of 15th International Symposium on High Performance Computer Architecture, North Carolina, USA, 2009: 443-454.
  • 5Dongkeun K, Liao S S W, Wang P H, et al: Physical experimentation with prefetching helper threads on Intel's hyper-thremied processorsIC]. Proceedings of International Symposium on Code Generation and Optimization, California, USA, 2004: 27-38.
  • 6Lu J. Design and implementation of a lightweight runtime optimization system on modern computer architectures[D]. [Ph.D. dissertation], University of Minnesota, 2006.
  • 7Ro W W and Gaudiot J L. Speculative pre-execution assisted by compiler (SPEAR)[J]. Journal of Parallel and Distributed Computing, 2006, 66(8): 1076-1089.
  • 8Somogyi S, Wenisch T F, Ailamaki A, et al: Spatial-temporal memory streaming[C]. Proceedings of thc 36th International Symposium on Computer Architecture, Austin, USA, 2009: 69-80.
  • 9Lee J, Jung C, Lim D, et al: Prefetching with helper threads for loosely coupled multiprocessor systems[J]. IEEE Transactions on Parallel and Distributed Systems, 2009, 20(9): 1309-1324.
  • 10Maxin G, McCurdy C, and Vetter J S. Diagnosis and optimization of application prefetching performance[C]. Proceedings of the 27th International ACM Conference on International Conference on Supercomputing, Oregon, USA,2013:303 312.

引证文献4

二级引证文献3

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部