期刊文献+
共找到4篇文章
< 1 >
每页显示 20 50 100
Automatic Target Description File Generation
1
作者 耿洪娜 吕方 +3 位作者 钟茗 崔慧敏 薛景玲 冯晓兵 《Journal of Computer Science & Technology》 SCIE EI CSCD 2023年第6期1339-1355,共17页
Agile hardware design is gaining increasing momentum and bringing new chips in larger quantities to the market faster.However,it also takes new challenges for compiler developers to retarget existing compilers to thes... Agile hardware design is gaining increasing momentum and bringing new chips in larger quantities to the market faster.However,it also takes new challenges for compiler developers to retarget existing compilers to these new chips in shorter time than ever before.Currently,retargeting a compiler backend,e.g.,an LLVM backend to a new target,requires compiler developers to write manually a set of target description files(totalling 10300+lines of code(LOC)for RISC-V in LLVM),which is error-prone and time-consuming.In this paper,we introduce a new approach,Au-tomatic Target Description File Generation(ATG),which accelerates the generation of a compiler backend for a new tar-get by generating its target description files automatically.Given a new target,ATG proceeds in two stages.First,ATG synthesizes a small list of target-specific properties and a list of code-layout templates from the target description files of a set of existing targets with similar instruction set architectures(ISAs).Second,ATG requests compiler developers to fill in the information for each instruction in the new target in tabular form according to the list of target-specific properties syn-thesized and then generates its target description files automatically according to the list of code-layout templates synthe-sized.The first stage can often be reused by different new targets sharing similar ISAs.We evaluate ATG using nine RISC-V instruction sets drawn from a total of 1029 instructions in LLVM 12.0.ATG enables compiler developers to gen-erate compiler backends for these ISAs that emit the same assembly code as the existing compiler backends for RISC-V but with significantly less development effort(by specifying each instruction in terms of up to 61 target-specific properties only). 展开更多
关键词 retargetability COMPILER target description target backend automatic generator
原文传递
VTensor:Using Virtual Tensors to Build a Layout-Oblivious AI Programming Framework
2
作者 俞峰 赵家程 +2 位作者 崔慧敏 冯晓兵 薛京灵 《Journal of Computer Science & Technology》 SCIE EI CSCD 2023年第5期1074-1097,共24页
Tensors are a popular programming interface for developing artificial intelligence(AI)algorithms.Layout refers to the order of placing tensor data in the memory and will affect performance by affecting data locality;t... Tensors are a popular programming interface for developing artificial intelligence(AI)algorithms.Layout refers to the order of placing tensor data in the memory and will affect performance by affecting data locality;therefore the deep neural network library has a convention on the layout.Since AI applications can use arbitrary layouts,and existing AI systems do not provide programming abstractions to shield the layout conventions of libraries,operator developers need to write a lot of layout-related code,which reduces the efficiency of integrating new libraries or developing new operators.Furthermore,the developer assigns the layout conversion operation to the internal operator to deal with the uncertainty of the input layout,thus losing the opportunity for layout optimization.Based on the idea of polymorphism,we propose a layout-agnostic virtual tensor programming interface,namely the VTensor framework,which enables developers to write new operators without caring about the underlying physical layout of tensors.In addition,the VTensor framework performs global layout inference at runtime to transparently resolve the required layout of virtual tensors,and runtime layout-oriented optimizations to globally minimize the number of layout transformation operations.Experimental results demonstrate that with VTensor,developers can avoid writing layout-dependent code.Compared with TensorFlow,for the 16 operations used in 12 popular networks,VTensor can reduce the lines of code(LOC)of writing a new operation by 47.82%on average,and improve the overall performance by 18.65%on average. 展开更多
关键词 artificial intelligence(AI)programming layout-oblivious tensor processing
原文传递
Parallel Incremental Frequent Itemset Mining for Large Data 被引量:5
3
作者 Yu-Geng Song hui-min cui Xiao-Bing Feng 《Journal of Computer Science & Technology》 SCIE EI CSCD 2017年第2期368-385,共18页
Frequent itemset mining (FIM) is a popular data mining issue adopted in many fields, such as commodity recommendation in the retail industry, log analysis in web searching, and query recommendation (or related sea... Frequent itemset mining (FIM) is a popular data mining issue adopted in many fields, such as commodity recommendation in the retail industry, log analysis in web searching, and query recommendation (or related search). A large number of FIM algorithms have been proposed to obtain better performance, including parallelized algorithms for processing large data volumes. Besides, incremental FIM algorithms are also proposed to deal with incremental database updates. However, most of these incremental algorithms have low parallelism, causing low efficiency on huge databases. This paper presents two parallel incremental FIM algorithms called IncMiningPFP and IncBuildingPFP, implemented on the MapReduce framework. IncMiningPFP preserves the FP-tree mining results of the original pass, and utilizes them for incremental calculations. In particular, we propose a method to generate a partial FP-tree in the incremental pass, in order to avoid unnecessary mining work. Further, some of the incremental parallel tasks can be omitted when the inserted transactions include fewer items. IncbuildingPFP preserves the CanTrees built in the original pass, and then adds new transactions to them during the incremental passes. Our experimental results show that IncMiningPFP can achieve significant speedup over PFP (Parallel FPGrowth) and a sequential incremental algorithm (CanTree) in most cases of incremental input database, and in other cases IncBuildingPFP can achieve it. 展开更多
关键词 incremental parallel FPGrowth data mining frequent itemset mining MAPREDUCE
原文传递
A Hybrid Circular Queue Method for Iterative Stencil Computations on GPUs 被引量:1
4
作者 Yang Yang hui-min cui +1 位作者 Xiao-Bing Feng Jing-Ling Xue 《Journal of Computer Science & Technology》 SCIE EI CSCD 2012年第1期57-74,共18页
In this paper, we present a hybrid circular queue method that can significantly boost the performance of stencil computations on GPU by carefully balancing usage of registers and shared-memory. Unlike earlier methods ... In this paper, we present a hybrid circular queue method that can significantly boost the performance of stencil computations on GPU by carefully balancing usage of registers and shared-memory. Unlike earlier methods that rely on circular queues predominantly implemented using indirectly addressable shared memory, our hybrid method exploits a new reuse pattern spanning across the multiple time steps in stencil computations so that circular queues can be implemented by both shared memory and registers effectively in a balanced manner. We describe a framework that automatically finds the best placement of data in registers and shared memory in order to maximize the performance of stencil computations. Validation using four different types of stencils on three different GPU platforms shows that our hybrid method achieves speedups up to 2.93X over methods that use circular queues implemented with shared-memory only. 展开更多
关键词 stencil computation circular queue GPU OCCUPANCY REGISTER
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部