面向非规则大数据分析应用的多核帮助线程预取方法被引量：4

Multi-core helper thread prefetching for irregular data intensive applications

下载PDF

导出

摘要大数据分析应用往往采用基于大型稀疏图的遍历算法,其主要特点是非规则数据密集访存。以频繁使用的具有大型稀疏图遍历特征的介度中心算法为例,提出一种基于帮助线程的多参数预取控制模型和参数优化方法,从而达到提高非规则数据密集程序性能的目的。在商用多核平台Q6600和I7上运用该方法后,介度中心算法在不同规模输入下平均性能加速比分别为1.20和1.11。实验结果表明,帮助线程预取能够有效提升该类非规则应用程序的性能。 Big data analysis applications often use sparse graph traversal algorithm which characterized by irregular data intensive memory access. For improving performance of memory access in sparse graph traversal algorithm, helper thread prefetching could convert discontinuous locality into continuous-instant spatial-temporal locality effectively by using the shared last level cache of chip multi-processor platforms. Betweenness centrality algorithm was used as a case study, the multi-parameter prefetching model of helper thread and optimized instances were presented and evaluated on commercial CMP platforms Q6600 and 17, the average speedup of betweenness centrality algorithm at different input scale is 1.20 and 1.11 respectively. The experiment results show that helper thread prefetching can improve the performance of irregular applications effectively.

作者张建勋古志民胡潇涵蔡旻

机构地区北京理工大学计算机学院天津中医药大学网络中心

出处《通信学报》 EI CSCD 北大核心 2014年第8期137-146,共10页 Journal on Communications

基金国家自然科学基金资助项目(61070029 61370062)~~

关键词帮助线程预取非规则数据密集应用介度中心性 helper thread prefetching irregular data intensive applications betweenness centrality

分类号 TP311 [自动化与计算机技术—计算机软件与理论]

引文网络
相关文献

参考文献19

1SOFFER A, HEILD M. Big data meets social analytics[A]. Proc of 2012 Conference of IBM Lotusphere[C]. Orlando, Florida, USA, 2012.
2BRANDES U. A faster algorithm for betweenness centrality[J]. Jour- nal of Mathematical Sociology, 2001, 25(2):163-177.
3BADER D, MADDURI K, GILBERT J, et al. Designing scalable synthetic compact applications for benchmarking high productivity computing systems[J]. CTWatch Quarterly, 2006, 4B(2):1-10.
4LUK C. Tolerating memory latency through soitware controlled pre-execution in simultaneous multithreading processors[A]. Proc of the 28th Annual International Symposium on Computer Architecture (ISCA)[C]. O6tehnrg, Sweden, 2001.40-51.
5SMITH JE. Decoupled access/execute computer architectures[A]. Proc of the 9th International Symposium on Computer Architecture (ISCA)[C]. Austin, TX, USA, 1982. 112-119.
6SONG Y, KALOGEROPULOS S, TIRUMALAI E Design and im- plementation of a compiler framework for helper threading on multi- core processors[A]. Proc of the 14th International Conference on Par- allel Architectures and Compilation Techniques (PACT)[C]. Saint Louis, MO, USA, 2005.99-109.
7KIM D, LIAO S, WANG P, et al. Physical experimentation with pre- fetching helper threads on Inters hyper-threaded processors[A]. Proc of the International Symposium on Code generation and Optimization (CGO)[C]. Palo Alto, Calif, 2004.27-38.
8LEE J, JUNG C, KIM D, et al. Prefetching with helper threads for lossely coupled multiprocessor systems[J]. 1EEE Transactions on Par- allel and Distributed System, 2009, 20(9): 1309-1324.
9LU J, DAS A, HSU W, et al. Dynamic helper threaded prefetching on the Sun UltraSparc CMP processor[A]. Proc of 38th Annual IEEE/ACM International Symposium Micro Architecture (MI- CRO)[C]. Barcelona, Spain, 2005.93-104.
10ZHOU J, CIESLEWICZ J, ROSS K, et aL Improving database per- formance on simultaneous multithreading processors[A]. Proc of the 31th International Conference on Very Large Data Bases[C]. Trond- helm, Norway, 2005.49-60.

二级参考文献15

1del Sol A, Fujihashi H, O'Meara P. Topology of small-world networks of protein--Protein complex structures. Bioinformatics, 2005,21(8):1311-1315. [doi: 10.1093/bioinformaticsPoti167].
2Liljeros F, Edling CR, Amaral LAN, Stanley HE, Aberg Y. The Web of human sexual contacts. Nature, 2001,411(6840):907-908. [doi: 10.1038/35082140].
3Krebs VE. Mapping networks of terrorist cells. Connections, 2002,24(3):43-52.
4Freeman LC. A set of measures of centrality based on betweenness. Sociomtry, 1977,40(1):35-41. [doi: 10.2307/3033543].
5Newman ME J, Girvan M. Finding and evaluating community structure in networks. Physical Review E, 2004,69(2): 2611301-2611315. [doi: 10. I 103/PhysRevE.69.026113].
6Crauser A, Mehlhorn K, Meyer U, Sanders P. A parallelization of dijkstras shortest path algorithm. Lecture Notes in Computer Science, 1998,1450:722-731. [doi: 10.1007/BFb0055823].
7Grama AY, Kumar V. A survey of parallel search algorithms for discrete optimization problems. ORSA Journal on Computing, 1993,7. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=l 0.1.1.45.9937.
8Klein PN, Subramanian S. A randomized parallel algorithm for single-source shortest paths. Journal of Algorithms, 1997,25(2): 205-220. [doi: 10.1006/jagm.1997.0888].
9Bader DA, Madduri K. Parallel algorithms for evaluating centrality indices in real-world networks. In: Feng WC, ed. Proe. of the 35th Int'l Conf. on Parallel Processing (ICPP 2006). Washington: IEEE Computer Society, 2006. 539-550. [doi: 10.1109/ICPP. 2006.57].
10Bader DA, Madduri K. Designing multithreaded algorithms for breadth-first search and st-connectivity on the cray rata-2. In: Feng WC, ed. Proc. of the 35th Int'l Conf. on Parallel Processing (ICPP 2006). Washington: IEEE Computer Society, 2006. 523-530. [doi: 10.1109/ICPP.2006.34].

共引文献6

1毛国勇,张宁.基于多核心工作站机群的并行介数算法[J].上海理工大学学报,2012,34(6):527-530.
2刘加海,杨茂林,雷航,廖勇.共享资源约束下多核实时任务分配算法[J].浙江大学学报（工学版）,2014,48(1):113-117. 被引量：4
3王敏,曹宝香,王蕾,冯晓兵.介度中心和PageRank算法应用场景分析[J].计算机工程,2015,41(12):299-304.
4王敏,王蕾,冯晓兵,曹宝香.基于顶点加权的介度中心近似算法研究[J].计算机研究与发展,2016,53(7):1631-1640.
5黄姝娟,容晓峰,肖锋,茹媛.基于跳表与布隆过滤器的混合关键任务调度方法[J].计算机工程,2017,43(1):86-92.
6王皎,李兴智,宋丫.跳表在嵌入式软件上的应用与研究[J].信息通信,2019,0(5):24-25.

同被引文献33

1蔡曲林,刘普寅.一种新的概率神经网络有监督学习算法[J].模糊系统与数学,2006,20(6):83-87. 被引量：12
2Chen T F and Bo:r J L. A performance study of software and hardware data prefetehing schemes[C]. Proceedings of 21st International Symposium on Computer Architecture, Chicago, USA, 1994: 223-232.
3Saavedra R H and Daeyeon P. Improving the effectiveness of software prefetching with adaptive execution[C]. Proceedings of Conference on Parallel Architectures and Compilation Techniques, Boston, USA, 1996: 68-78.
4Hut I and Lin C. Feedback mechanisms for improving probabilistic memory prefetching[C]. Proceedings of 15th International Symposium on High Performance Computer Architecture, North Carolina, USA, 2009: 443-454.
5Dongkeun K, Liao S S W, Wang P H, et al: Physical experimentation with prefetching helper threads on Intel's hyper-thremied processorsIC]. Proceedings of International Symposium on Code Generation and Optimization, California, USA, 2004: 27-38.
6Lu J. Design and implementation of a lightweight runtime optimization system on modern computer architectures[D]. [Ph.D. dissertation], University of Minnesota, 2006.
7Ro W W and Gaudiot J L. Speculative pre-execution assisted by compiler (SPEAR)[J]. Journal of Parallel and Distributed Computing, 2006, 66(8): 1076-1089.
8Somogyi S, Wenisch T F, Ailamaki A, et al: Spatial-temporal memory streaming[C]. Proceedings of thc 36th International Symposium on Computer Architecture, Austin, USA, 2009: 69-80.
9Lee J, Jung C, Lim D, et al: Prefetching with helper threads for loosely coupled multiprocessor systems[J]. IEEE Transactions on Parallel and Distributed Systems, 2009, 20(9): 1309-1324.
10Maxin G, McCurdy C, and Vetter J S. Diagnosis and optimization of application prefetching performance[C]. Proceedings of the 27th International ACM Conference on International Conference on Supercomputing, Oregon, USA,2013:303 312.

引证文献4

1黄艳,张启坤,段赵磊,古志民.基于缓存行为特征的线程数据预取距离控制策略[J].电子与信息学报,2015,37(7):1633-1638. 被引量：1
2张建勋,古志民.帮助线程预取质量的实时在线评价方法[J].计算机应用,2017,37(1):114-119. 被引量：2
3张建勋,乔欣雨,林炳辉.深度学习在多核缓存预取中的应用研究综述[J].计算机应用研究,2024,41(2):341-347.
4张建勋,古志民.基于交织预取率的帮助线程预取质量调节算法[J].计算机应用研究,2019,36(2):430-434. 被引量：2

二级引证文献3

1余龙龙,韩林.基于申威GCC编译器的间接预取算法[J].计算机系统应用,2022,31(8):203-211.
2张建勋,乔欣雨,林炳辉.深度学习在多核缓存预取中的应用研究综述[J].计算机应用研究,2024,41(2):341-347.
3张建勋,古志民.基于交织预取率的帮助线程预取质量调节算法[J].计算机应用研究,2019,36(2):430-434. 被引量：2

1付铮.德州仪器多核技术显著提升无线电网络控制器整体效率[J].电信工程技术与标准化,2008,21(10):92-92.
2德州仪器多核技术提升无线电网络控制器整体效率[J].移动通信,2008,32(20):65-65.
3华北工控推出新一代存储产品——DS-1600系列[J].计算机测量与控制,2009,17(10):2104-2104.
4华北工控推出DS-1600系列存储产品[J].测控技术,2009,28(11):102-102.
5徐开勇,龚雪容,成茂才.基于改进Apriori算法的审计日志关联规则挖掘[J].计算机应用,2016,36(7):1847-1851. 被引量：50
6杜恒,龚茜茹.图的深度优先遍历的C语言实现[J].九江职业技术学院学报,2004(2):26-28. 被引量：2
7潘东静,宁玉富,刘建军.图遍历的演示[J].德州高专学报,2000,16(4):21-23.
8李龙江,申山宏,刘来军.面向数据密集应用的云计算服务架构[J].微型机与应用,2012,31(15):60-63. 被引量：2
9杨旭.求最小生成树的另一算法及其与其它算法的比较[J].重庆电力高等专科学校学报,2003,8(2):49-52. 被引量：2
10王敏,曹宝香,王蕾,冯晓兵.介度中心和PageRank算法应用场景分析[J].计算机工程,2015,41(12):299-304.

通信学报

2014年第8期

浏览历史

内容加载中请稍等...

面向非规则大数据分析应用的多核帮助线程预取方法被引量：4

参考文献19

二级参考文献15

共引文献6

同被引文献33

引证文献4

二级引证文献3

相关作者

相关机构

相关主题

浏览历史

面向非规则大数据分析应用的多核帮助线程预取方法 被引量：4

参考文献19

二级参考文献15

共引文献6

同被引文献33

引证文献4

二级引证文献3

相关作者

相关机构

相关主题

浏览历史

面向非规则大数据分析应用的多核帮助线程预取方法被引量：4