An object-oriented C++ parallel compiler System, called OOCPCS, is developed to facilitate programmers to write sequential programs using C++ or Annotated C++ language for parallel computahon. OOCPCS bases on an integ...An object-oriented C++ parallel compiler System, called OOCPCS, is developed to facilitate programmers to write sequential programs using C++ or Annotated C++ language for parallel computahon. OOCPCS bases on an integrated object-oriented paradigm and large-grain data flow model, called OOLGDFM, and recognizes automatically parallel objects using parallel compiling techniques. The paper describes the object-oriented parallel model and realization of the System on networks.展开更多
The currently available compilation techniques are for general computing and are not optimized for physical layer computing in 5G micro base stations.In such cases,the foreseeable data sizes and small code size are ap...The currently available compilation techniques are for general computing and are not optimized for physical layer computing in 5G micro base stations.In such cases,the foreseeable data sizes and small code size are application specific opportunities for baseband algorithm optimizations.Therefore,the special attention can be paid,for example,the specific register allocation algorithm has not been studied so far.The compilation for kernel sub-routines of baseband in 5G micro base stations is our focusing point.For applications of known and fixed data size,we proposed a compilation scheme of parallel data accessing,while operands can be mainly allocated and stored in registers.Based on a small register group(48×32b),the target of our compilation scheme is the optimization of baseband algorithms based on 4×4 or smaller matrices,maximizing the utilization of register files,and eliminating the extra register data exchanging.Meanwhile,when data is allocated into register files,we used VLIW(Very Long Instruction Word)machine to hide the time of data accessing and minimize the cost of data accessing,thus the total execution time is minimum.Experiments indicate that for algorithms with small data size,the cost of data accessing and extra addressing can be minimized.展开更多
Parallel loops account for the greatest amount of parallelism in numerical programs.Executing nested loops in parallel with low run-time overhead is thus very important for achieving high perform- ance in parallel pro...Parallel loops account for the greatest amount of parallelism in numerical programs.Executing nested loops in parallel with low run-time overhead is thus very important for achieving high perform- ance in parallel processing systems.However,in parallel processing systems with caches or local memo- ries in memory hierarchies,“thrashing problem”may arise whenever data move back and forth between the caches or local memories in different processors. Previous techniques can only deal with the rather simple cases with one linear function in the perfect- ly nested loop.In this paper,we present a parallel program optimizing technique called hybrid loop inter- change(HLI)for the cases with multiple linear functions and loop carried data dependences in the nested loop.With HLI we can easily eliminate or reduce the thrashing phenomena without reducing the program parallelism.展开更多
In this paper,we focus on the compiling implementation of parallel logic language PARLOG and functional language ML on distributed memory multiprocessors.Under the graph rewriting framework, a Heterogeneous Parallel G...In this paper,we focus on the compiling implementation of parallel logic language PARLOG and functional language ML on distributed memory multiprocessors.Under the graph rewriting framework, a Heterogeneous Parallel Graph Rewriting Execution Model(HPGREM)is presented firstly.Then based on HPGREM,a parallel abstract machine PAM/TGR is described.Furthermore,several optimizing compilation schemes for executing declarative programs on transputer array are proposed. The performance statistics on a transputer array demonstrate the effectiveness of our model,parallel ab- stract machine,optimizing compilation strategies and compiler.展开更多
For the moment, commercial parallel computer systems with distributed memory architecture are usually provided with parallel FORTRAN or parallel C compliers, which are just traditional sequential FORTRAN or C compiler...For the moment, commercial parallel computer systems with distributed memory architecture are usually provided with parallel FORTRAN or parallel C compliers, which are just traditional sequential FORTRAN or C compilers expanded with communication statements. Programmers suffer from writing parallel programs with communication statements.The Shared Variable Oriented Parallel Precompiler (SVOPP) proposed in this paper can automatically generate appropriate communication statements based on shared variables for SPMD (Single Program Multiple Data) computa-tion model and greatly ease the parallel programming with high communication efficiency. The core function of parallel C precompi1er has been successfully veri-fied on a transputer-based parallel computer. Its prominent performance shows that SVOPP is probably a break-through in parallel programming technique.展开更多
In this paper, a new loop transformation is proposed that can exploit parallelism in loops which cannot be found by traditional methods. Then the method is extended to show how to achieve maximum speedup of loops if t...In this paper, a new loop transformation is proposed that can exploit parallelism in loops which cannot be found by traditional methods. Then the method is extended to show how to achieve maximum speedup of loops if there are infinite processors and how to balance the workload of parallel sections in loops i f there is fixed number of processors.展开更多
This paper presents a model for automatically parallelizing compiler based on C++ which consists of compile-time and run-time parallelizing facilities.The paper also describes a method for finding both intra-object an...This paper presents a model for automatically parallelizing compiler based on C++ which consists of compile-time and run-time parallelizing facilities.The paper also describes a method for finding both intra-object and inter-object parallelism. The parallelism detection is completely transparent to users.展开更多
This paper discusses the compile time task scheduling of parallel program running on cluster of SMP workstations. Firstly, the problem is stated formally and transformed into a graph parti-tion problem and proved to b...This paper discusses the compile time task scheduling of parallel program running on cluster of SMP workstations. Firstly, the problem is stated formally and transformed into a graph parti-tion problem and proved to be NP-Complete. A heuristic algorithm MMP-Solver is then proposed to solve the problem. Experiment result shows that the task scheduling can reduce communication over-head of parallel applications greatly and MMP-Solver outperforms the existing algorithms.展开更多
Parallel loops account for the greatest amount of parallelism in numerical programs.Executing nested loops in parallel with low run-time overhead is thus very important for achieving high performance in parallel proce...Parallel loops account for the greatest amount of parallelism in numerical programs.Executing nested loops in parallel with low run-time overhead is thus very important for achieving high performance in parallel processing systems.However,in parallel processing systems with caches or local memories in memory hierarchies,“thrashing problem”may arise when data move back and forth frequently between the caches or local memories in different processors.The techniques associated with parallel compiler to solve the problem are not completely developed.In this paper,we present two restructuring techniques called loop staggering,loop staggering and compacting,with which we can not only eliminate the cache or local memory thrashing phenomena significantly,but also exploit the potential parallelism existing in outer serial loop.Loop staggering benefits the dynamic loop scheduling strategies,whereas loop staggering and compacting is good for static loop scheduling strategies,Our method especially benefits parallel programs,in which a parallel loop is enclosed by a serial loop and array elements are repeatedly used in the different iterations of the parallel loop.展开更多
文摘An object-oriented C++ parallel compiler System, called OOCPCS, is developed to facilitate programmers to write sequential programs using C++ or Annotated C++ language for parallel computahon. OOCPCS bases on an integrated object-oriented paradigm and large-grain data flow model, called OOLGDFM, and recognizes automatically parallel objects using parallel compiling techniques. The paper describes the object-oriented parallel model and realization of the System on networks.
基金supported by the research funding KYQD(ZR)1974 from Hainan University.
文摘The currently available compilation techniques are for general computing and are not optimized for physical layer computing in 5G micro base stations.In such cases,the foreseeable data sizes and small code size are application specific opportunities for baseband algorithm optimizations.Therefore,the special attention can be paid,for example,the specific register allocation algorithm has not been studied so far.The compilation for kernel sub-routines of baseband in 5G micro base stations is our focusing point.For applications of known and fixed data size,we proposed a compilation scheme of parallel data accessing,while operands can be mainly allocated and stored in registers.Based on a small register group(48×32b),the target of our compilation scheme is the optimization of baseband algorithms based on 4×4 or smaller matrices,maximizing the utilization of register files,and eliminating the extra register data exchanging.Meanwhile,when data is allocated into register files,we used VLIW(Very Long Instruction Word)machine to hide the time of data accessing and minimize the cost of data accessing,thus the total execution time is minimum.Experiments indicate that for algorithms with small data size,the cost of data accessing and extra addressing can be minimized.
文摘Parallel loops account for the greatest amount of parallelism in numerical programs.Executing nested loops in parallel with low run-time overhead is thus very important for achieving high perform- ance in parallel processing systems.However,in parallel processing systems with caches or local memo- ries in memory hierarchies,“thrashing problem”may arise whenever data move back and forth between the caches or local memories in different processors. Previous techniques can only deal with the rather simple cases with one linear function in the perfect- ly nested loop.In this paper,we present a parallel program optimizing technique called hybrid loop inter- change(HLI)for the cases with multiple linear functions and loop carried data dependences in the nested loop.With HLI we can easily eliminate or reduce the thrashing phenomena without reducing the program parallelism.
基金This work was partially supported by the National 863 High Technical Grant 863-306-101the National Doctoral Subject Foundation Grant 0249136.
文摘In this paper,we focus on the compiling implementation of parallel logic language PARLOG and functional language ML on distributed memory multiprocessors.Under the graph rewriting framework, a Heterogeneous Parallel Graph Rewriting Execution Model(HPGREM)is presented firstly.Then based on HPGREM,a parallel abstract machine PAM/TGR is described.Furthermore,several optimizing compilation schemes for executing declarative programs on transputer array are proposed. The performance statistics on a transputer array demonstrate the effectiveness of our model,parallel ab- stract machine,optimizing compilation strategies and compiler.
文摘For the moment, commercial parallel computer systems with distributed memory architecture are usually provided with parallel FORTRAN or parallel C compliers, which are just traditional sequential FORTRAN or C compilers expanded with communication statements. Programmers suffer from writing parallel programs with communication statements.The Shared Variable Oriented Parallel Precompiler (SVOPP) proposed in this paper can automatically generate appropriate communication statements based on shared variables for SPMD (Single Program Multiple Data) computa-tion model and greatly ease the parallel programming with high communication efficiency. The core function of parallel C precompi1er has been successfully veri-fied on a transputer-based parallel computer. Its prominent performance shows that SVOPP is probably a break-through in parallel programming technique.
文摘In this paper, a new loop transformation is proposed that can exploit parallelism in loops which cannot be found by traditional methods. Then the method is extended to show how to achieve maximum speedup of loops if there are infinite processors and how to balance the workload of parallel sections in loops i f there is fixed number of processors.
文摘This paper presents a model for automatically parallelizing compiler based on C++ which consists of compile-time and run-time parallelizing facilities.The paper also describes a method for finding both intra-object and inter-object parallelism. The parallelism detection is completely transparent to users.
基金This work was supported by the National Natural Science Foundation of China (Grant No. 69933020) the "973" Program (Grant No. G1999032702).
文摘This paper discusses the compile time task scheduling of parallel program running on cluster of SMP workstations. Firstly, the problem is stated formally and transformed into a graph parti-tion problem and proved to be NP-Complete. A heuristic algorithm MMP-Solver is then proposed to solve the problem. Experiment result shows that the task scheduling can reduce communication over-head of parallel applications greatly and MMP-Solver outperforms the existing algorithms.
文摘Parallel loops account for the greatest amount of parallelism in numerical programs.Executing nested loops in parallel with low run-time overhead is thus very important for achieving high performance in parallel processing systems.However,in parallel processing systems with caches or local memories in memory hierarchies,“thrashing problem”may arise when data move back and forth frequently between the caches or local memories in different processors.The techniques associated with parallel compiler to solve the problem are not completely developed.In this paper,we present two restructuring techniques called loop staggering,loop staggering and compacting,with which we can not only eliminate the cache or local memory thrashing phenomena significantly,but also exploit the potential parallelism existing in outer serial loop.Loop staggering benefits the dynamic loop scheduling strategies,whereas loop staggering and compacting is good for static loop scheduling strategies,Our method especially benefits parallel programs,in which a parallel loop is enclosed by a serial loop and array elements are repeatedly used in the different iterations of the parallel loop.