期刊文献+
共找到4篇文章
< 1 >
每页显示 20 50 100
Bigflow: A General Optimization Layer for Distributed Computing Frameworks 被引量:1
1
作者 Yun-Cong Zhang Xiao-Yang Wang +10 位作者 Cong Wang Yao Xu Jian-Wei Zhang Xiao-Dong Lin Guang-Yu Sun Gong-Lin Zheng Shan-Hui Yin Xian-Jin Ye Li Li Zhan Song Dong-Dong Miao 《Journal of Computer Science & Technology》 SCIE EI CSCD 2020年第2期453-467,共15页
As data volumes grow rapidly, distributed computations are widely employed in data-centers to provide cheap and efficient methods to process large-scale parallel datasets. Various computation models have been proposed... As data volumes grow rapidly, distributed computations are widely employed in data-centers to provide cheap and efficient methods to process large-scale parallel datasets. Various computation models have been proposed to improve the abstraction of distributed datasets and hide the details of parallelism. However, most of them follow the single-layer partitioning method, which limits developers to express a multi-level partitioning operation succinctly. To overcome the problem, we present the NDD (Nested Distributed Dataset) data model. It is a more compact and expressive extension of Spark RDD (Resilient Distributed Dataset), in order to remove the burden on developers to manually write the logic for multi-level partitioning cases. Base on the NDD model, we develop an open-source framework called Bigflow, which serves as an optimization layer over computation engines from most widely used processing frameworks. With the help of Bigflow, some advanced optimization techniques, which may only be applied by experienced programmers manually, are enabled automatically in a distributed data processing job. Currently, Bigflow is processing about 3 PB data volumes daily in the data-centers of Baidu. According to customer experience, it can significantly save code length and improve performance over the intuitive programming style. 展开更多
关键词 DISTRIBUTED COMPUTING PROGRAMMING model OPTIMIZATION technique
原文传递
Performance-Centric Optimization for Racetrack Memory Based Register File on GPUs
2
作者 Yun Liang Shuo Wang 《Journal of Computer Science & Technology》 SCIE EI CSCD 2016年第1期36-49,共14页
The key to high performance for GPU architecture lies in its massive threading capability to drive a large number of cores and enable execution overlapping among threads. However, in reality, the number of threads tha... The key to high performance for GPU architecture lies in its massive threading capability to drive a large number of cores and enable execution overlapping among threads. However, in reality, the number of threads that can simultaneously execute is often limited by the size of the register file on GPUs. The traditional SRAM-based register file takes up so large amount of chip area that it cannot scale to meet the increasing demand of GPU applications. Racetrack memory (RM) is a promising technology for designing large capacity register file on GPUs due to its high data storage density. However, without careful deployment of RM-based register file, the lengthy shift operations of RM may hurt the performance. In this paper, we explore RM for designing high-performance register file for GPU architecture. High storage density RM helps to improve the thread level parallelism (TLP), but if the bits of the registers are not aligned to the ports, shift operations are required to move the bits to the access ports before they are accessed, and thus the read/write operations are delayed. We develop an optimization framework for RM-based register file on GPUs, which employs three different optimization techniques at the application, compilation, and architecture level, respectively. More clearly, we optimize the TLP at the application level, design a register mapping algorithm at the compilation level, and design a preshifting mechanism at the architecture level. Collectively, these optimizations help to determine the TLP without causing cache and register file resource contention and reduce the shift operation overhead. Experimental results using a variety of representative workloads demonstrate that our optimization framework achieves up to 29% (21% on average) performance improvement. 展开更多
关键词 register file racetrack memory GPU
原文传递
Area Efficient Pattern Representation of Binary Neural Networks on RRAM
3
作者 Feng Wang Guo-Jie Luo +3 位作者 Guang-Yu Sun Yu-Hao Wang Di-Min Niu Hong-Zhong Zheng 《Journal of Computer Science & Technology》 SCIE EI CSCD 2021年第5期1155-1166,共12页
Resistive random access memory(RRAM)has been demonstrated to implement multiply-and-accumulate(MAC)operations using a highly parallel analog fashion,which dramatically accelerates the convolutional neural networks(CNN... Resistive random access memory(RRAM)has been demonstrated to implement multiply-and-accumulate(MAC)operations using a highly parallel analog fashion,which dramatically accelerates the convolutional neural networks(CNNs).Since CNNs require considerable converters between analog crossbars and digital peripheral circuits,recent studies map the binary neural networks(BNNs)onto RRAM and binarize the weights to{+1,-1}.However,two mainstream representations for BNN weights introduce patterns of redundant 0s and 1s when dealing with negative weights.In this work,we reduce the area of redundant 0s and 1s by proposing a BNN weight representation framework based on the novel pattern representation and a corresponding architecture.First,we spilt the weight matrix into several small matrices by clustering adjacent columns together.Second,we extract 1s'patterns,i.e.,the submatrices only containing 1s,from the small weight matrix,such that each final output can be represented by the sum of several patterns.Third,we map these patterns onto RRAM crossbars,including pattern computation crossbars(PCCs)and pattern accumulation crossbars(PACs).Finally,we compare the pattern representation with two mainstream representations and adopt the more area efficient one.The evaluation results demonstrate that our framework can save over 20%of crossbar area effectively,compared with two mainstream representations. 展开更多
关键词 binary neural network(BNN) PATTERN resistive random access memory(RRAM)
原文传递
Preface
4
作者 Wen-Guang Chen Ying-Wei Luo Guang-Yu Sun 《Journal of Computer Science & Technology》 SCIE EI CSCD 2020年第2期379-381,共3页
ACM SIGOPS ChinaSys conference is organized twice a year by ChinaSys, which is an active community forresearchers and practitioners on computer systems in China. Since August 2015, ChinaSys has become an ACMSIGOPS cha... ACM SIGOPS ChinaSys conference is organized twice a year by ChinaSys, which is an active community forresearchers and practitioners on computer systems in China. Since August 2015, ChinaSys has become an ACMSIGOPS chapter. The first ChinaSys conference happened in November 2011 in Shenzhen. Now it has become anew leading international forum for academia, industry, and government to present novel research results in theprinciple and practice of computer systems. All topic areas related to design and implementation of computersystems are of interest and in scope. 展开更多
关键词 China. COMPUTER HAPPEN
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部