该文提出了一种应用于移动顶点处理器的高性能低功耗定点特殊函数运算单元电路。该运算单元支持嵌入式图形标准OpenGL ES 1.X的定点数据格式,并支持小数点后16位精度的倒数、均方根、倒数均方根、对数和指数等初等函数运算。初等函数采...该文提出了一种应用于移动顶点处理器的高性能低功耗定点特殊函数运算单元电路。该运算单元支持嵌入式图形标准OpenGL ES 1.X的定点数据格式,并支持小数点后16位精度的倒数、均方根、倒数均方根、对数和指数等初等函数运算。初等函数采用分段二次多项式插值方法近似计算,系数处理中引入2运算电路,相对于传统的设计在相同的精度下使整体的二次多项式查找表大小减少了29%。优化二次多项式插值算法的计算误差和截断误差,使电路的查找表大小、平方器、乘法器和加法器的面积、速度达到最优。该电路采用0.18μm的CMOS工艺实现,面积为0.112 mm2,芯片时钟频率达到300 MHz,功耗仅为12.8 mW。测试结果表明该定点特殊函数运算单元非常适合移动图形顶点处理器的初等函数计算应用。展开更多
Breadth-first search(BFS) is an important kernel for graph traversal and has been used by many graph processing applications. Extensive studies have been devoted in boosting the performance of BFS. As the most effecti...Breadth-first search(BFS) is an important kernel for graph traversal and has been used by many graph processing applications. Extensive studies have been devoted in boosting the performance of BFS. As the most effective solution, GPU-acceleration achieves the state-of-the-art result of 3.3×109 traversed edges per second on a NVIDIA Tesla C2050 GPU. A novel vertex frontier based GPU BFS algorithm is proposed, and its main features are three-fold. Firstly, to obtain a better workload balance for irregular graphs, a virtual-queue task decomposition and mapping strategy is introduced for vertex frontier expanding. Secondly, a global deduplicate detection scheme is proposed to remove reduplicative vertices from vertex frontier effectively. Finally, a GPU-based bottom-up BFS approach is employed to process large frontier. The experimental results demonstrate that the algorithm can achieve 10% improvement over the state-of-the-art method on diverse graphs. Especially, it exhibits 2-3 times speedup on low-diameter and scale-free graphs over the state-of-the-art on a NVIDIA Tesla K20 c GPU, reaching a peak traversal rate of 11.2×109 edges/s.展开更多
基金Projects(61272142,61103082,61003075,61170261,61103193)supported by the National Natural Science Foundation of ChinaProject supported by the Program for New Century Excellent Talents in University of ChinaProjects(2012AA01A301,2012AA010901)supported by the National High Technology Research and Development Program of China
文摘Breadth-first search(BFS) is an important kernel for graph traversal and has been used by many graph processing applications. Extensive studies have been devoted in boosting the performance of BFS. As the most effective solution, GPU-acceleration achieves the state-of-the-art result of 3.3×109 traversed edges per second on a NVIDIA Tesla C2050 GPU. A novel vertex frontier based GPU BFS algorithm is proposed, and its main features are three-fold. Firstly, to obtain a better workload balance for irregular graphs, a virtual-queue task decomposition and mapping strategy is introduced for vertex frontier expanding. Secondly, a global deduplicate detection scheme is proposed to remove reduplicative vertices from vertex frontier effectively. Finally, a GPU-based bottom-up BFS approach is employed to process large frontier. The experimental results demonstrate that the algorithm can achieve 10% improvement over the state-of-the-art method on diverse graphs. Especially, it exhibits 2-3 times speedup on low-diameter and scale-free graphs over the state-of-the-art on a NVIDIA Tesla K20 c GPU, reaching a peak traversal rate of 11.2×109 edges/s.