期刊文献+
共找到9篇文章
< 1 >
每页显示 20 50 100
A survey of neural network accelerator with software development environments
1
作者 Jin Song Xuemeng Wang +2 位作者 Zhipeng Zhao Wei Li Tian Zhi 《Journal of Semiconductors》 EI CAS CSCD 2020年第2期20-28,共9页
Recent years,the deep learning algorithm has been widely deployed from cloud servers to terminal units.And researchers proposed various neural network accelerators and software development environments.In this article... Recent years,the deep learning algorithm has been widely deployed from cloud servers to terminal units.And researchers proposed various neural network accelerators and software development environments.In this article,we have reviewed the representative neural network accelerators.As an entirety,the corresponding software stack must consider the hardware architecture of the specific accelerator to enhance the end-to-end performance.And we summarize the programming environments of neural network accelerators and optimizations in software stack.Finally,we comment the future trend of neural network accelerator and programming environments. 展开更多
关键词 neural network accelerator compiling optimization programming environments
下载PDF
IOPS:computational graph optimization based on inter-operators parallel scheduling
2
作者 谢晓燕 XU Hao +1 位作者 ZHU Yun HE Wanqi 《High Technology Letters》 EI CAS 2023年第1期50-59,共10页
To improve the inference efficiency of convolutional neural networks(CNN),the existing neural networks mainly adopt heuristic and dynamic programming algorithms to realize parallel scheduling among operators.Heuristic... To improve the inference efficiency of convolutional neural networks(CNN),the existing neural networks mainly adopt heuristic and dynamic programming algorithms to realize parallel scheduling among operators.Heuristic scheduling algorithms can generate local optima easily,while the dynamic programming algorithm has a long convergence time for complex structural models.This paper mainly studies the parallel scheduling between operators and proposes an inter-operator parallelism schedule(IOPS)scheduling algorithm that guarantees the minimum similar execution delay.Firstly,a graph partitioning algorithm based on the largest block is designed to split the neural network model into multiple subgraphs.Then,the operators that meet the conditions is replaced according to the defined operator replacement rules.Finally,the optimal scheduling method based on backtracking is used to schedule the computational graph.Network models such as Inception-v3,ResNet-50,and RandWire are selected for testing.The experimental results show that the algorithm designed in this paper can achieve a 1.6×speedup compared with the existing sequential execution methods. 展开更多
关键词 compile optimization convolutional neural network(CNN) inter-operator parallelism schedule(IOPS) operator replacement
下载PDF
AN EFFICIENT GPU ACCELERATION FORMAT FOR FINITE ELEMENT ANALYSIS
3
作者 Tian Jin Li Gong +1 位作者 Fei Wu Zeng Guohui 《Journal of Electronics(China)》 2013年第6期599-608,共10页
This paper proposes a new Graphics Processing Unit(GPU)-accelerated storage format to speed up Sparse Matrix Vector Products(SMVPs) for Finite Element Method(FEM) analysis of electromagnetic problems.A new format call... This paper proposes a new Graphics Processing Unit(GPU)-accelerated storage format to speed up Sparse Matrix Vector Products(SMVPs) for Finite Element Method(FEM) analysis of electromagnetic problems.A new format called Modified Compile Time Optimization(MCTO) format is used to reduce much execution time and design for hastening the iterative solution of FEM equations especially when rows have uneven lengths.The MCTO-applied FEM is about 10 times faster than conventional FEM on a CPU,and faster than other row-major ordering formats on a GPU.Numerical results show that the proposed GPU-accelerated storage format turns out to be an excellent accelerator. 展开更多
关键词 Finite Element Method (FEM) Graphics Processing Unit (GPU) Parallelizationstrategy Modified Compile Time Optimization (MCTO)
下载PDF
Modified condition/decision coverage(MC/DC)oriented compiler optimization for symbolic execution
4
作者 Wei-jiang HONG Yi-jun LIU +2 位作者 Zhen-bang CHEN Wei DONG Ji WANG 《Frontiers of Information Technology & Electronic Engineering》 SCIE EI CSCD 2020年第9期1267-1284,共18页
Symbolic execution is an effective way of systematically exploring the search space of a program,and is often used for automatic software testing and bug finding.The program to be analyzed is usually compiled into a b... Symbolic execution is an effective way of systematically exploring the search space of a program,and is often used for automatic software testing and bug finding.The program to be analyzed is usually compiled into a binary or an intermediate representation,on which symbolic execution is carried out.During this process,compiler optimizations influence the effectiveness and efficiency of symbolic execution.However,to the best of our knowledge,there exists no work on compiler optimization recommendation for symbolic execution with respect to(w.r.t.)modified condition/decision coverage(MC/DC),which is an important testing coverage criterion widely used for mission-critical software.This study describes our use of a state-of-the-art symbolic execution tool to carry out extensive experiments to study the impact of compiler optimizations on symbolic execution w.r.t.MC/DC.The results indicate that instruction combining(IC)optimization is the important and dominant optimization for symbolic execution w.r.t.MC/DC.We designed and implemented a support vector machine based optimization recommendation method w.r.t.IC(denoted as auto).The experiments on two standard benchmarks(Coreutils and NECLA)showed that auto achieves the best MC/DC on 67.47%of Coreutils programs and 78.26%of NECLA programs. 展开更多
关键词 compiler optimization Modified condition/decision coverage(MC/DC) Optimization recommendation Symbolic execution
原文传递
SRF Coloring:Stream Register File Allocation via Graph Coloring 被引量:1
5
作者 杨学军 邓宇 +5 位作者 汪黎 晏小波 杜静 张英 王桂彬 唐滔 《Journal of Computer Science & Technology》 SCIE EI CSCD 2009年第1期152-164,共13页
Stream Register File (SRF) is a large on-chip memory of the stream processor and its efficient management is essential for good performance. Current stream programming languages expose the management of SRF to the p... Stream Register File (SRF) is a large on-chip memory of the stream processor and its efficient management is essential for good performance. Current stream programming languages expose the management of SRF to the programmer, incurring heavy burden on the programmer and bringing difficulties to inheriting the legacy codes. SF95 is the language developed for FT64 which is the first 64-bit stream processor designed for scientific applications. SF95 conceals SRF from the programmer and leaves the management of SRF to its compiler. In this paper, we present a compiler approach named SRF Coloring to manage SRF automatically. The novelties of this paper are: first, it is the first time to use the graph coloring-based algorithm for the SRF management; second, an algorithm framework for SRF Coloring that is well suited to the FT64 architecture is proposed this framework is based on a well-understood graph coloring algorithm for register allocation, together with some modifications to deal with the unusual aspects of SRF problem; third, the SRF Coloring algorithm is implemented in SF95Compiler, a compiler designed for FT64 and SF95. The experimental results show that our approach represents a practical and promising solution to SRF allocation. 展开更多
关键词 memory management SRF coloring graph coloring stream processor stream register file compiler optimization
原文传递
PMU Guided Structure Data-Layout Optimization
6
作者 闫家年 陈文光 郑纬民 《Tsinghua Science and Technology》 SCIE EI CAS 2011年第2期145-150,共6页
Existing methods of obtaining runtime feedback for structure data-layout optimization have several drawbacks, such as large overhead and difficulty composing training sets. As a result, structure data-layout optimizat... Existing methods of obtaining runtime feedback for structure data-layout optimization have several drawbacks, such as large overhead and difficulty composing training sets. As a result, structure data-layout optimization is not widely used. To overcome these drawbacks, a performance monitoring unit (PMU) sampling method was developed with much less overhead and better portability and usability. An algorithm was developed to correct incomplete and inaccurate PMU sampling. With the corrected PMU feedback, a structure data-layout optimizer achieved a 45.1% performance improvement compared to a design without data-layout optimization, which is 97.6% of the performance improvement achieved with instrumented feedback. Calculation of the PMU feedback increased the execution time by 12.3%, compared to the overhead for the instrumented feedback of 341.5%. Tests show that the PMU feedback is efficient and effective for structure data-layout optimization. 展开更多
关键词 structure data layout performance monitoring unit (PMU) compiler optimization program locality
原文传递
Compiling CIL Rewriting Language for Multiprocessors
7
作者 田新民 王鼎兴 +2 位作者 郑纬民 沈美明 李程 《Journal of Computer Science & Technology》 SCIE EI CSCD 1994年第4期302-310,共9页
The highlevel Compiler Intermediate Language CIL is a generalpurpose descripotion language of parallel graph rewriting computational model intended for parallelimplementatioll of declarative languages on multiprocesso... The highlevel Compiler Intermediate Language CIL is a generalpurpose descripotion language of parallel graph rewriting computational model intended for parallelimplementatioll of declarative languages on multiprocessor systems. In this paper, wefirst outline a new Hybrid Execution Model(HEM) and corresponding parallel abstract machine PAM/TGR based on Extended parallel Graph Rewriting ComputationalModel EGRCM for implementing CIL language on distributed memory multiprocessorsystems. Then we focus on the compiling CIL language with various optindzing techniques such as pattern matching, rule indexing, node ordering and compile-time partialscheduling. The experimental results on a 16-node Thansputer Array demonstrates the effectiveness of our model and strategies. 展开更多
关键词 Graph rewriting rewriting language optimizing compiler pattern matching rule indexing rule-group compiling MULTIPROCESSORS
原文传递
Simultaneous Minimization of Capacity and Conflict Misses
8
作者 李致远 《Journal of Computer Science & Technology》 SCIE EI CSCD 2007年第4期497-504,共8页
Loop tiling (or loop blocking) is a well-known loop transformation to improve temporal locality in nested loops which perform matrix computations. When targeting caches that have low associativities, one of the key ... Loop tiling (or loop blocking) is a well-known loop transformation to improve temporal locality in nested loops which perform matrix computations. When targeting caches that have low associativities, one of the key challenges for loop tiling is to simultaneously minimize capacity misses and conflict misses. This paper analyzes the effect of the tile size and the array-dimension size on capacity misses and conflict misses. The analysis supports the approach of combining tile-size selection (to minimize capacity misses) with array padding (to minimize conflict misses). 展开更多
关键词 CACHES optimizing compilers loop transformation array padding
原文传递
Optimized Parallel Execution of Declarative Programs on Distributed Memory Multiprocessors
9
作者 沈美明 田新民 +2 位作者 王鼎兴 郑纬民 温冬婵 《Journal of Computer Science & Technology》 SCIE EI CSCD 1993年第3期233-242,共10页
In this paper,we focus on the compiling implementation of parallel logic language PARLOG and functional language ML on distributed memory multiprocessors.Under the graph rewriting framework, a Heterogeneous Parallel G... In this paper,we focus on the compiling implementation of parallel logic language PARLOG and functional language ML on distributed memory multiprocessors.Under the graph rewriting framework, a Heterogeneous Parallel Graph Rewriting Execution Model(HPGREM)is presented firstly.Then based on HPGREM,a parallel abstract machine PAM/TGR is described.Furthermore,several optimizing compilation schemes for executing declarative programs on transputer array are proposed. The performance statistics on a transputer array demonstrate the effectiveness of our model,parallel ab- stract machine,optimizing compilation strategies and compiler. 展开更多
关键词 Declarative language parallel graph rewriting execution model optimized parallel compiler distributed memory multiprocessors parallel abstract machine
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部