An enhanced GPU reduction at the warp-level

An enhanced GPU reduction at the warp-level

下载PDF

导出

摘要 In recent years, graphical processing unit （GPU）-accelerated intelligent algorithms have been widely utilized for solving combination optimization problems, which are NP-hard, These intelligent algorithms involves a common operation, namely reduction, in which the best suitable candidate solution in the neighborhood is selected. As one of the main procedures, it is necessary to optimize the reduction on the GPU. In this paper, we propose an enhanced warp-based reduction on the GPU. Compared with existing block-based reduction methods, our method exploit efficiently the potential of implementation at warp level, which better matches the characteristics of current GPU architecture. Firstly, in order to improve the global memory access performance, the vectoring accessing is utilized. Secondly, at the level of thread block reduction, an enhanced warp-based reduction on the shared memory are presented to form partial results. Thirdly, for the configuration of the number of thread blocks, the number of thread blocks can be obtained by maximizing the size of thread block and the maximum size of threads per stream multi-processor on GPU. Finally, the proposed method is evaluated on three generations of NVIDIA GPUs with the better performances than previous methods. In recent years, graphical processing unit （GPU）-accelerated intelligent algorithms have been widely utilized for solving combination optimization problems, which are NP-hard, These intelligent algorithms involves a common operation, namely reduction, in which the best suitable candidate solution in the neighborhood is selected. As one of the main procedures, it is necessary to optimize the reduction on the GPU. In this paper, we propose an enhanced warp-based reduction on the GPU. Compared with existing block-based reduction methods, our method exploit efficiently the potential of implementation at warp level, which better matches the characteristics of current GPU architecture. Firstly, in order to improve the global memory access performance, the vectoring accessing is utilized. Secondly, at the level of thread block reduction, an enhanced warp-based reduction on the shared memory are presented to form partial results. Thirdly, for the configuration of the number of thread blocks, the number of thread blocks can be obtained by maximizing the size of thread block and the maximum size of threads per stream multi-processor on GPU. Finally, the proposed method is evaluated on three generations of NVIDIA GPUs with the better performances than previous methods.

作者 Hou Neng He Fazhi Zhou Yi

机构地区 School of Computer Science and Technology

出处《Computer Aided Drafting,Design and Manufacturing》 2016年第2期43-52,共10页 计算机辅助绘图设计与制造（英文版）

基金 Supported by National Nature Science Foundation of China(61472289) the Nature Science Foundation of Hubei Province(2015CFB254)

关键词 REDUCTION graphical processing unit computing unified device architecture warp-level reduction reduction graphical processing unit computing unified device architecture warp-level reduction

分类号 TP18 [自动化与计算机技术—控制理论与控制工程]

引文网络
相关文献

1LI Ping,SUN HanQiu,SHENG Bin,SHEN JianBing.Image stylization with enhanced structure on GPU[J].Science China(Information Sciences),2012,55(5):1093-1105. 被引量：3
2染苑精粹[J].印染,2010,36(8):59-59.
3刘海林,许小军.Argacel RC的应用技术[J].纺织导报,2005(8):48-49.
4王慧玲,周彬,于湖生.羊毛溶解方法探究[J].山东纺织科技,2006,47(6):5-7. 被引量：11
5给你一个枕头还你精致睡眠——Sommox智能抱枕[J].计算机应用文摘,2017,0(7):74-74.
6邹天奇.Editmax-7校色全攻略（中）——GPU Gamma颜色校正篇[J].世界广播电视,2008,22(3):88-90.
7王立光.CPU＋GPU技术探讨[J].世界广播电视,2005,19(8):96-96.
8杨晓波.遗传算法在织物起皱等级评定中的应用[J].上海纺织科技,2002,30(2):63-64. 被引量：4
9Flora.3D Warp-Knitted Textiles——A New Dimension in Sound Insulation[J].China Textile,2013(9):40-42.
10平价超耐久3 技嘉GA-M720-US3主板[J].电脑迷,2009(9):39-39.

Computer Aided Drafting,Design and Manufacturing

2016年第2期

浏览历史

内容加载中请稍等...

An enhanced GPU reduction at the warp-level

相关作者

相关机构

相关主题

浏览历史