期刊文献+
共找到2篇文章
< 1 >
每页显示 20 50 100
A Pipelining Loop Optimization Method for Dataflow Architecture 被引量:2
1
作者 xu Tan Xiao-Chun Ye +6 位作者 Xiao-Wei Shen yuan-chao xu Da Wang Lunkai Zhang Wen-Ming Li Dong-Rui Fan Zhi-Min Tang 《Journal of Computer Science & Technology》 SCIE EI CSCD 2018年第1期116-130,共15页
With the coming of exascale supercomputing era, power efficiency has become the most important obstacle to build an exascale system. Dataflow architecture has native advantage in achieving high power efficiency for sc... With the coming of exascale supercomputing era, power efficiency has become the most important obstacle to build an exascale system. Dataflow architecture has native advantage in achieving high power efficiency for scientific applications. However, the state-of-the-art dataflow architectures fail to exploit high parallelism for loop processing. To address this issue, we propose a pipelining loop optimization method (PLO), which makes iterations in loops flow in the processing element (PE) array of dataflow accelerator. This method consists of two techniques, architecture-assisted hardware iteration and instruction-assisted software iteration. In hardware iteration execution model, an on-chip loop controller is designed to generate loop indexes, reducing the complexity of computing kernel and laying a good f(mndation for pipelining execution. In software iteration execution model, additional loop instructions are presented to solve the iteration dependency problem. Via these two techniques, the average number of instructions ready to execute per cycle is increased to keep floating-point unit busy. Simulation results show that our proposed method outperforms static and dynamic loop execution model in floating-point efficiency by 2.45x and 1.1x on average, respectively, while the hardware cost of these two techniques is acceptable. 展开更多
关键词 dataflow model control-flow model loop optimization exascale computing scientific application
原文传递
Reducing Synchronization Cost for Single-Level Store in Mobile Systems
2
作者 yuan-chao xu Hu Wan +2 位作者 Ke-Ni Qiu Tao Li Wei-Gong Zhang 《Journal of Computer Science & Technology》 SCIE EI CSCD 2016年第4期836-848,共13页
Emerging byte-addressable non-volatile memory technologies, such as phase change memory (PCM) and spin- transfer torque RAM (STT-RAM), offer both the byte-addressability of memory and the durability of storage, th... Emerging byte-addressable non-volatile memory technologies, such as phase change memory (PCM) and spin- transfer torque RAM (STT-RAM), offer both the byte-addressability of memory and the durability of storage, thus making it feasible to build single-level store systems. To ensure the consistency of persistent data structures in the presence of power failures or system crashes, it requires flushing cache lines to persistent memory frequently, thus incurring non-trivial synchronization overhead. To mitigate this issue, we propose two techniques. First, we use non-volatile STT-RAM as scratchpad memory on chip to store recovery information, thereby eliminating synchronization cost in the logging phase due to the avoidance of off-chip logging operations. Second, we present an adaptive synchronization policy based on caching modes in terms of data access patterns, thereby eliminating unnecessary synchronization cost in the checkpoint phase. Evaluation results indicate that the two techniques improve the overall performance from 2.15x to 2.39x compared with conventional transactional persistent memory. 展开更多
关键词 crash consistency synchronization cost persistent memory power failure mobile system
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部