期刊文献+
共找到2篇文章
< 1 >
每页显示 20 50 100
wrBench:Comparing Cache Architectures and Coherency Protocols on ARMv8 Many-Core Systems
1
作者 高琬蓉 方建滨 +2 位作者 黄春 徐传福 王峥 《Journal of Computer Science & Technology》 SCIE EI CSCD 2023年第6期1323-1338,共16页
Cache performance is a critical design constraint for modern many-core systems.Since the cache often works in a"black-box"manner,it is difficult for the software to reason about the cache behavior to match t... Cache performance is a critical design constraint for modern many-core systems.Since the cache often works in a"black-box"manner,it is difficult for the software to reason about the cache behavior to match the running software to the underlying hardware.To better support code optimization,we need to understand and characterize the cache be-havior.While cache performance characterization is heavily studied on traditional x86 architectures,there is little work for understanding the cache implementations on emerging ARMv8-based many-cores.This paper presents a comprehensive study to evaluate the cache architecture design on three representative ARMv8 multi-cores,Phytium 2000+,ThunderX2,and Kunpeng 920(KP920).To this end,we develop wrBench,a micro-benchmark suite to measure the realized latency and bandwidth of caches at different memory hierarchies when performing core-to-core communication.Our evaluation pro-vides inter-core latency and bandwidth in different cache levels and coherency states for the three ARMv8 many-cores.The quantitative performance data is shown in tables.We mine the characteristics of caches and coherency protocols by analyzing the data for the three processors,Phytium 2000+,ThunderX2,and KP920.Our paper also provides discussions and guidelines for optimizing memory access on ARMv8 many-cores. 展开更多
关键词 ARMv8 many-core cache architecture microbenchmark core-to-core communication
原文传递
A Quantitative Evaluation of Vector Transcendental Functions on ARMv8-Based Processors
2
作者 沈洁 龙标 黄春 《Journal of Computer Science & Technology》 SCIE EI CSCD 2023年第3期686-701,共16页
Transcendental functions are important functions in various high performance computing applications.Because these functions are time-consuming and the vector units on modern processors become wider and more scalable,t... Transcendental functions are important functions in various high performance computing applications.Because these functions are time-consuming and the vector units on modern processors become wider and more scalable,there is an increasing demand for developing and using vector transcendental functions in such performance-hungry applications.However,the performance of vector transcendental functions as well as their accuracy remain largely unexplored.To address this issue,we perform a comprehensive evaluation of two Single Instruction Multiple Data(SIMD)intrinsics based vector math libraries on two ARMv8 compatible processors.We first design dedicated microbenchmarks that help us understand the performance behavior of vector transcendental functions.Then,we propose a piecewise,quantitative evaluation method with a set of meaningful metrics to quantify their performance and accuracy.By analyzing the experimental results,we find that vector transcendental functions achieve good performance speedups thanks to the vectorization and algorithm optimization.Moreover,vector math libraries can replace scalar math libraries in many cases because of improved performance and satisfactory accuracy.Despite this,the implementations of vector math libraries are still immature,which means further optimization is needed,and our evaluation reveals feasible optimization solutions for future vector math libraries. 展开更多
关键词 transcendental function vector math library piecewise quantitative evaluation microbenchmarking ARMv8-based processor
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部