This paper describes an efficient solution to parallelize softwareprogram instructions, regardless of the programming language in which theyare written. We solve the problem of the optimal distribution of a set ofinst...This paper describes an efficient solution to parallelize softwareprogram instructions, regardless of the programming language in which theyare written. We solve the problem of the optimal distribution of a set ofinstructions on available processors. We propose a genetic algorithm to parallelize computations, using evolution to search the solution space. The stagesof our proposed genetic algorithm are: The choice of the initial populationand its representation in chromosomes, the crossover, and the mutation operations customized to the problem being dealt with. In this paper, geneticalgorithms are applied to the entire search space of the parallelization ofthe program instructions problem. This problem is NP-complete, so thereare no polynomial algorithms that can scan the solution space and solve theproblem. The genetic algorithm-based method is general and it is simple andefficient to implement because it can be scaled to a larger or smaller number ofinstructions that must be parallelized. The parallelization technique proposedin this paper was developed in the C# programming language, and our resultsconfirm the effectiveness of our parallelization method. Experimental resultsobtained and presented for different working scenarios confirm the theoreticalresults, and they provide insight on how to improve the exploration of a searchspace that is too large to be searched exhaustively.展开更多
This paper presents a parameterized instruction scheduling algorithm based on machine description table for TH-RISC system, having a (3-5) stages pipeline structure.It would provide considerable fiexibility for instru...This paper presents a parameterized instruction scheduling algorithm based on machine description table for TH-RISC system, having a (3-5) stages pipeline structure.It would provide considerable fiexibility for instruction scheduling, improving execution efficiency for rapidly upgrading RISC machines. Alld, using this instruction scheduler as a tool, the effect of several methods for solving instruction interlock problem has been analyzed. Finally, a high performance approach combining the hardware feasibility and software effectiveness for solving instruction interlock problem, the improvement of instruction level parallelism (ILP) and speed-up results are given.The algorithm complexity is O(n2).展开更多
This paper uses timed Petri net to model and analyze the problem of instructionlevel loop scheduling with resource constraints, which has been proven to be an NP complete problem. First, we present a new timed Petri n...This paper uses timed Petri net to model and analyze the problem of instructionlevel loop scheduling with resource constraints, which has been proven to be an NP complete problem. First, we present a new timed Petri net model to integrate functional unit allocation, register allocation and spilling ilno a unified theoretical framework.Then we develop a state subgraph, called Register Allocation Solution Graph, which can effectively describe the major behavior of our new model. The maill property of this state subgraph is that the number of all its nodes is polynomial. Finally we present and prove that the optimum loop schedules can be found with polynomial computation complexity, for almost all practical loop prograrns. Our work lightens a new idea of finding the optimum loop schedules.展开更多
It can be observed from looking backward that processor architecture is improved through spirally shifting from simple to complex and from complex to simple. Nowadays we are facing another shifting from complex to sim...It can be observed from looking backward that processor architecture is improved through spirally shifting from simple to complex and from complex to simple. Nowadays we are facing another shifting from complex to simple, and new innovative architecture will emerge to utilize the continuously increasing transistor budgets. The growing importance of wire delays, changing workloads, power consumption, and design/verification complexity will drive the forthcoming era of Chip Multiprocessors (CMPs). Furthermore, typical CMP projects both from industries and from academics are investigated. Through going into depths for some primary theoretical and implementation problems of CMPs, the great challenges and opportunities to future CMPs are presented and discussed. Finally, the Godson series microprocessors designed in China are introduced.展开更多
We studied the architecture of embedded computing systems from the viewpoint of power consumption in memory systems and used a selective-code-compression (SCC) approach to realize our design.Based on the LZW (Lempel-Z...We studied the architecture of embedded computing systems from the viewpoint of power consumption in memory systems and used a selective-code-compression (SCC) approach to realize our design.Based on the LZW (Lempel-Ziv-Welch) compression algorithm,we propose a novel cost effective compression and decompression method.The goal of our study was to develop a new SCC approach with an extended decision policy based on the prediction of power consumption.Our decompression method had to be easily implemented in hardware and to collaborate with the embedded processor.The hardware implementation of our decompression engine uses the TSMC 0.18μm-2p6m model and its cell-based libraries.To calculate power consumption more accurately,we used a static analysis method to estimate the power overhead of the decompression engine.We also used variable sized branch blocks and considered several features of very long instruction word (VLIW) processors for our compression,including the instruction level parallelism (ILP) technique and the scheduling of instructions.Our code-compression methods are not limited to VLIW machines,and can be applied to other kinds of reduced instruction set computer (RISC) architecture.展开更多
文摘This paper describes an efficient solution to parallelize softwareprogram instructions, regardless of the programming language in which theyare written. We solve the problem of the optimal distribution of a set ofinstructions on available processors. We propose a genetic algorithm to parallelize computations, using evolution to search the solution space. The stagesof our proposed genetic algorithm are: The choice of the initial populationand its representation in chromosomes, the crossover, and the mutation operations customized to the problem being dealt with. In this paper, geneticalgorithms are applied to the entire search space of the parallelization ofthe program instructions problem. This problem is NP-complete, so thereare no polynomial algorithms that can scan the solution space and solve theproblem. The genetic algorithm-based method is general and it is simple andefficient to implement because it can be scaled to a larger or smaller number ofinstructions that must be parallelized. The parallelization technique proposedin this paper was developed in the C# programming language, and our resultsconfirm the effectiveness of our parallelization method. Experimental resultsobtained and presented for different working scenarios confirm the theoreticalresults, and they provide insight on how to improve the exploration of a searchspace that is too large to be searched exhaustively.
文摘This paper presents a parameterized instruction scheduling algorithm based on machine description table for TH-RISC system, having a (3-5) stages pipeline structure.It would provide considerable fiexibility for instruction scheduling, improving execution efficiency for rapidly upgrading RISC machines. Alld, using this instruction scheduler as a tool, the effect of several methods for solving instruction interlock problem has been analyzed. Finally, a high performance approach combining the hardware feasibility and software effectiveness for solving instruction interlock problem, the improvement of instruction level parallelism (ILP) and speed-up results are given.The algorithm complexity is O(n2).
文摘This paper uses timed Petri net to model and analyze the problem of instructionlevel loop scheduling with resource constraints, which has been proven to be an NP complete problem. First, we present a new timed Petri net model to integrate functional unit allocation, register allocation and spilling ilno a unified theoretical framework.Then we develop a state subgraph, called Register Allocation Solution Graph, which can effectively describe the major behavior of our new model. The maill property of this state subgraph is that the number of all its nodes is polynomial. Finally we present and prove that the optimum loop schedules can be found with polynomial computation complexity, for almost all practical loop prograrns. Our work lightens a new idea of finding the optimum loop schedules.
基金Supported by the National Natural Science Foundation of China for Distinguished Young Scholar under Grant No. 60325205 the National High Technology Development 863 Program of China under Grants No. 2002AA110010, No. 2005AA110010 No. 2005AA119020, and the National Grand Fundamental Research 973 Program of China under Grant No. 2005CB321600.
文摘It can be observed from looking backward that processor architecture is improved through spirally shifting from simple to complex and from complex to simple. Nowadays we are facing another shifting from complex to simple, and new innovative architecture will emerge to utilize the continuously increasing transistor budgets. The growing importance of wire delays, changing workloads, power consumption, and design/verification complexity will drive the forthcoming era of Chip Multiprocessors (CMPs). Furthermore, typical CMP projects both from industries and from academics are investigated. Through going into depths for some primary theoretical and implementation problems of CMPs, the great challenges and opportunities to future CMPs are presented and discussed. Finally, the Godson series microprocessors designed in China are introduced.
基金Project (No.97-2218-E-011-016-) supported by the National Science Council
文摘We studied the architecture of embedded computing systems from the viewpoint of power consumption in memory systems and used a selective-code-compression (SCC) approach to realize our design.Based on the LZW (Lempel-Ziv-Welch) compression algorithm,we propose a novel cost effective compression and decompression method.The goal of our study was to develop a new SCC approach with an extended decision policy based on the prediction of power consumption.Our decompression method had to be easily implemented in hardware and to collaborate with the embedded processor.The hardware implementation of our decompression engine uses the TSMC 0.18μm-2p6m model and its cell-based libraries.To calculate power consumption more accurately,we used a static analysis method to estimate the power overhead of the decompression engine.We also used variable sized branch blocks and considered several features of very long instruction word (VLIW) processors for our compression,including the instruction level parallelism (ILP) technique and the scheduling of instructions.Our code-compression methods are not limited to VLIW machines,and can be applied to other kinds of reduced instruction set computer (RISC) architecture.