In the upcoming exa-scale era, the exploitation of data locality in parallel programs is very important because it benefits both program performance and energy efficiency. However, this is a hard topic for graph algor...In the upcoming exa-scale era, the exploitation of data locality in parallel programs is very important because it benefits both program performance and energy efficiency. However, this is a hard topic for graph algorithms such as the Breadth First Search (BFS) due to the irregular data access patterns. This study analyzes the exploitation of data locality in the BFS and its impact on the energy efficiency with the Codelet fine-grain dataflow-inspired execution model. The Codelet Model more efficiently exploits data locality than the OpenMP-like execution models which traditionally focus on coarse-grain parallelism inside loops. A BFS algorithm is then given to exploit the locality between two loop iterations that belong to two different loops (inter-loop locality). This kind of locality can be exploited by the Codelet Model but not by traditional coarse-grain execution models like OpenMR Tests were performed on fsim which is a simulation platform developed by Intel for the Ubiquitous High Performance Computing (UHPC) project to design future exa-scale architectures. The results show that this BFS algorithm saves up to 7% of the dynamic energy for memory accesses compared to a BFS implementation based on OpenMP loop scheduling.展开更多
Breadth-first search(BFS) is an important kernel for graph traversal and has been used by many graph processing applications. Extensive studies have been devoted in boosting the performance of BFS. As the most effecti...Breadth-first search(BFS) is an important kernel for graph traversal and has been used by many graph processing applications. Extensive studies have been devoted in boosting the performance of BFS. As the most effective solution, GPU-acceleration achieves the state-of-the-art result of 3.3×109 traversed edges per second on a NVIDIA Tesla C2050 GPU. A novel vertex frontier based GPU BFS algorithm is proposed, and its main features are three-fold. Firstly, to obtain a better workload balance for irregular graphs, a virtual-queue task decomposition and mapping strategy is introduced for vertex frontier expanding. Secondly, a global deduplicate detection scheme is proposed to remove reduplicative vertices from vertex frontier effectively. Finally, a GPU-based bottom-up BFS approach is employed to process large frontier. The experimental results demonstrate that the algorithm can achieve 10% improvement over the state-of-the-art method on diverse graphs. Especially, it exhibits 2-3 times speedup on low-diameter and scale-free graphs over the state-of-the-art on a NVIDIA Tesla K20 c GPU, reaching a peak traversal rate of 11.2×109 edges/s.展开更多
首先分析潮流转移的原因及伴随的现象。其次讨论潮流转移区域以及区域界定,对传统广度优先遍历(breadth first search,BFS)算法进行改进,提出潮流转移影响区域的界定方法。对安全评估工作的理论基础——3个基本概念(模型量化、平均功率...首先分析潮流转移的原因及伴随的现象。其次讨论潮流转移区域以及区域界定,对传统广度优先遍历(breadth first search,BFS)算法进行改进,提出潮流转移影响区域的界定方法。对安全评估工作的理论基础——3个基本概念(模型量化、平均功率角和潮流转移灵敏度)分别进行定义。提出潮流转移模型及其灵敏度的表达式。提出安全评估的评估方法,建立安全评估的数学模型,最终得到安全评估的综合指标,并阐述了指标的使用。开发潮流转移灵敏度及安全评估程序,利用该程序对真实电网算例进行仿真验证。展开更多
基金National Science Foundation of USA(Nos.CCF-0833122,CCF-0925863,CCF-0937907,CNS-0720531,and OCI-0904534)supported by the Department of Energy(National Nuclear Security Administration)under the Award Number DE-SC0008717.Moreoverpartly supported by European FP7 project TERAFLUX,id.249013
文摘In the upcoming exa-scale era, the exploitation of data locality in parallel programs is very important because it benefits both program performance and energy efficiency. However, this is a hard topic for graph algorithms such as the Breadth First Search (BFS) due to the irregular data access patterns. This study analyzes the exploitation of data locality in the BFS and its impact on the energy efficiency with the Codelet fine-grain dataflow-inspired execution model. The Codelet Model more efficiently exploits data locality than the OpenMP-like execution models which traditionally focus on coarse-grain parallelism inside loops. A BFS algorithm is then given to exploit the locality between two loop iterations that belong to two different loops (inter-loop locality). This kind of locality can be exploited by the Codelet Model but not by traditional coarse-grain execution models like OpenMR Tests were performed on fsim which is a simulation platform developed by Intel for the Ubiquitous High Performance Computing (UHPC) project to design future exa-scale architectures. The results show that this BFS algorithm saves up to 7% of the dynamic energy for memory accesses compared to a BFS implementation based on OpenMP loop scheduling.
基金Projects(61272142,61103082,61003075,61170261,61103193)supported by the National Natural Science Foundation of ChinaProject supported by the Program for New Century Excellent Talents in University of ChinaProjects(2012AA01A301,2012AA010901)supported by the National High Technology Research and Development Program of China
文摘Breadth-first search(BFS) is an important kernel for graph traversal and has been used by many graph processing applications. Extensive studies have been devoted in boosting the performance of BFS. As the most effective solution, GPU-acceleration achieves the state-of-the-art result of 3.3×109 traversed edges per second on a NVIDIA Tesla C2050 GPU. A novel vertex frontier based GPU BFS algorithm is proposed, and its main features are three-fold. Firstly, to obtain a better workload balance for irregular graphs, a virtual-queue task decomposition and mapping strategy is introduced for vertex frontier expanding. Secondly, a global deduplicate detection scheme is proposed to remove reduplicative vertices from vertex frontier effectively. Finally, a GPU-based bottom-up BFS approach is employed to process large frontier. The experimental results demonstrate that the algorithm can achieve 10% improvement over the state-of-the-art method on diverse graphs. Especially, it exhibits 2-3 times speedup on low-diameter and scale-free graphs over the state-of-the-art on a NVIDIA Tesla K20 c GPU, reaching a peak traversal rate of 11.2×109 edges/s.
文摘首先分析潮流转移的原因及伴随的现象。其次讨论潮流转移区域以及区域界定,对传统广度优先遍历(breadth first search,BFS)算法进行改进,提出潮流转移影响区域的界定方法。对安全评估工作的理论基础——3个基本概念(模型量化、平均功率角和潮流转移灵敏度)分别进行定义。提出潮流转移模型及其灵敏度的表达式。提出安全评估的评估方法,建立安全评估的数学模型,最终得到安全评估的综合指标,并阐述了指标的使用。开发潮流转移灵敏度及安全评估程序,利用该程序对真实电网算例进行仿真验证。