In the upcoming exa-scale era, the exploitation of data locality in parallel programs is very important because it benefits both program performance and energy efficiency. However, this is a hard topic for graph algor...In the upcoming exa-scale era, the exploitation of data locality in parallel programs is very important because it benefits both program performance and energy efficiency. However, this is a hard topic for graph algorithms such as the Breadth First Search (BFS) due to the irregular data access patterns. This study analyzes the exploitation of data locality in the BFS and its impact on the energy efficiency with the Codelet fine-grain dataflow-inspired execution model. The Codelet Model more efficiently exploits data locality than the OpenMP-like execution models which traditionally focus on coarse-grain parallelism inside loops. A BFS algorithm is then given to exploit the locality between two loop iterations that belong to two different loops (inter-loop locality). This kind of locality can be exploited by the Codelet Model but not by traditional coarse-grain execution models like OpenMR Tests were performed on fsim which is a simulation platform developed by Intel for the Ubiquitous High Performance Computing (UHPC) project to design future exa-scale architectures. The results show that this BFS algorithm saves up to 7% of the dynamic energy for memory accesses compared to a BFS implementation based on OpenMP loop scheduling.展开更多
基金National Science Foundation of USA(Nos.CCF-0833122,CCF-0925863,CCF-0937907,CNS-0720531,and OCI-0904534)supported by the Department of Energy(National Nuclear Security Administration)under the Award Number DE-SC0008717.Moreoverpartly supported by European FP7 project TERAFLUX,id.249013
文摘In the upcoming exa-scale era, the exploitation of data locality in parallel programs is very important because it benefits both program performance and energy efficiency. However, this is a hard topic for graph algorithms such as the Breadth First Search (BFS) due to the irregular data access patterns. This study analyzes the exploitation of data locality in the BFS and its impact on the energy efficiency with the Codelet fine-grain dataflow-inspired execution model. The Codelet Model more efficiently exploits data locality than the OpenMP-like execution models which traditionally focus on coarse-grain parallelism inside loops. A BFS algorithm is then given to exploit the locality between two loop iterations that belong to two different loops (inter-loop locality). This kind of locality can be exploited by the Codelet Model but not by traditional coarse-grain execution models like OpenMR Tests were performed on fsim which is a simulation platform developed by Intel for the Ubiquitous High Performance Computing (UHPC) project to design future exa-scale architectures. The results show that this BFS algorithm saves up to 7% of the dynamic energy for memory accesses compared to a BFS implementation based on OpenMP loop scheduling.