This paper focuses on the design process for reconfigurable architecture. Our contribution focuses on introducing a new temporal partitioning algorithm. Our algorithm is based on typical mathematic flow to solve the t...This paper focuses on the design process for reconfigurable architecture. Our contribution focuses on introducing a new temporal partitioning algorithm. Our algorithm is based on typical mathematic flow to solve the temporal partitioning problem. This algorithm optimizes the transfer of data required between design partitions and the reconfiguration overhead. Results show that our algorithm considerably decreases the communication cost and the latency compared with other well known algorithms.展开更多
Reconfigurable computing systems can be reconfigured at runtime and support partial reconfigurability which makes us able to execute tasks in a true multitasking manner. To manage such systems at runtime, a reconfigur...Reconfigurable computing systems can be reconfigured at runtime and support partial reconfigurability which makes us able to execute tasks in a true multitasking manner. To manage such systems at runtime, a reconfigurable operating system is needed. The main part of this operating system is resource management unit which performs on-line scheduling and placement of hardware tasks at runtime. Reconfiguration overhead is an important obstacle that limits the performance of on-line scheduling algorithms in reconfigurable computing systems and increases the overall execution time. Configuration reusing (task reusing) can decrease reconfiguration overhead considerably, particularly in periodic applications or the applications in which the probability of tasks recurrence is high. In this paper, we present a technique called reusing-based scheduling (RBS), for on-line scheduling and placement in which configuration reusing is considered as a main characteristic in order to reduce reconfiguration overhead and decrease total execution time of the tasks. Several experiments have been conducted on the proposed algorithm. Obtained results show considerable improvement in overall execution time of the tasks.展开更多
As a computing paradigm that combines temporal and spatial computations,dynamic reconfigurable computing provides superiorities of flexibility,energy efficiency and area efficiency,attracting interest from both academ...As a computing paradigm that combines temporal and spatial computations,dynamic reconfigurable computing provides superiorities of flexibility,energy efficiency and area efficiency,attracting interest from both academia and industry.However,dynamic reconfigurable computing is not yet mature because of several unsolved problems.This work introduces the concept,architecture,and compilation techniques of dynamic reconfigurable computing.It also discusses the existing major challenges and points out its potential applications.展开更多
This paper describes a new specialized Reconfigurable Cryptographic for Block ciphersArchitecture(RCBA).Application-specific computation pipelines can be configured according to thecharacteristics of the block cipher ...This paper describes a new specialized Reconfigurable Cryptographic for Block ciphersArchitecture(RCBA).Application-specific computation pipelines can be configured according to thecharacteristics of the block cipher processing in RCBA,which delivers high performance for crypto-graphic applications.RCBA adopts a coarse-grained reconfigurable architecture that mixes the ap-propriate amount of static configurations with dynamic configurations.RCBA has been implementedbased on Altera’s FPGA,and representative algorithms of block cipher such as DES,Rijndael and RC6have been mapped on RCBA architecture successfully.System performance has been analyzed,andfrom the analysis it is demonstrated that the RCBA architecture can achieve more flexibility and ef-ficiency when compared with other implementations.展开更多
The concept and advantage of reconfigurable technology is introduced. A kind of processor architecture of re configurable macro processor (RMP) model based on FPGA array and DSP is put forward and has been implemented...The concept and advantage of reconfigurable technology is introduced. A kind of processor architecture of re configurable macro processor (RMP) model based on FPGA array and DSP is put forward and has been implemented. Two image algorithms are developed: template-based automatic target recognition and zone labeling. One is estimating for motion direction in the infrared image background, another is line picking-up algorithm based on image zone labeling and phase grouping technique. It is a kind of 'hardware' function that can be called by the DSP in high-level algorithm. It is also a kind of hardware algorithm of the DSP. The results of experiments show the reconfigurable computing technology based on RMP is an ideal accelerating means to deal with the high-speed image processing tasks. High real time performance is obtained in our two applications on RMP.展开更多
Reconfigurable computing tries to achieve the balance between high efficiency of custom computing and flexibility of general-purpose computing. This paper presents the implementation techniques in LEAP, a coarse-grain...Reconfigurable computing tries to achieve the balance between high efficiency of custom computing and flexibility of general-purpose computing. This paper presents the implementation techniques in LEAP, a coarse-grained reconfigurable array, and proposes a speculative execution mechanism for dynamic loop scheduling with the goal of one iteration per cycle and implementation techniques to support decoupling synchronization between the token generator and the collector. This paper also in- troduces the techniques of exploiting both data dependences of intra- and inter-iteration, with the help of two instructions for special data reuses in the loop-carried dependences. The experimental results show that the number of memory accesses reaches on average 3% of an RISC processor simulator with no memory optimization. In a practical image matching application, LEAP architecture achieves about 34 times of speedup in execution cycles, compared with general-purpose processors.展开更多
To apply a quasi-cyclic low density parity check(QC-LDPC)to different scenarios,a data-stream driven pipelined macro instruction set and a reconfigurable processor architecture are proposed for the typical QC-LDPC alg...To apply a quasi-cyclic low density parity check(QC-LDPC)to different scenarios,a data-stream driven pipelined macro instruction set and a reconfigurable processor architecture are proposed for the typical QC-LDPC algorithm.The data-level parallelism is improved by instructions to dynamically configure the multi-core computing units.Simultaneously,an intelligent adjustment strategy based on a programmable wake-up controller(WuC)is designed so that the computing mode,operating voltage,and frequency of the QC-LDPC algorithm can be adjusted.This adjustment can improve the computing efficiency of the processor.The QC-LDPC processors are verified on the Xilinx ZCU102 field programmable gate array(FPGA)board and the computing efficiency is measured.The experimental results indicate that the QC-LDPC processor can support two encoding lengths of three typical QC-LDPC algorithms and 20 adaptive operating modes of operating voltage and frequency.The maximum efficiency can reach up to 12.18 Gbit/(s·W),which is more flexible than existing state-of-the-art processors for QC-LDPC.展开更多
In this short article, we would like to introduce the Center for Domain-Specific Computing (CDSC) established in 2009, primarily funded by the US National Science Foundation with an award from the 2009 Expeditions i...In this short article, we would like to introduce the Center for Domain-Specific Computing (CDSC) established in 2009, primarily funded by the US National Science Foundation with an award from the 2009 Expeditions in Computing Program. In this project we look beyond parallelization and focus on customization as the next disruptive technology to bring orders-of-magnitude power-performance efficiency improvement for applications in a specific domain.展开更多
In order to take into account the computing efficiency and flexibility of calculating transcendental functions, this paper proposes one kind of reconfigurable transcendental function generator. The generator is of a r...In order to take into account the computing efficiency and flexibility of calculating transcendental functions, this paper proposes one kind of reconfigurable transcendental function generator. The generator is of a reconfigurable array structure composed of 30 processing elements (PEs). The coordinate rotational digital computer (CORDIC) algorithm is implemented on this structure. Different functions, such as sine, cosine, inverse tangent, logarithmic, etc., can be calculated based on the structure by reconfiguring the functions of PEs. The functional simulation and field programmable gate array (FPGA) verification show that the proposed method obtains great flexibility with acceptable performance.展开更多
In this paper,domain decomposition method(DDM) for numerical solutions of mathematical physics equations is improved into dynamic domain decomposition method(DDDM) . The main feature of the DDDM is that the number...In this paper,domain decomposition method(DDM) for numerical solutions of mathematical physics equations is improved into dynamic domain decomposition method(DDDM) . The main feature of the DDDM is that the number,shape and volume of the sub-domains are all flexibly changeable during the iterations,so it suits well to be implemented on a reconfigurable parallel computing system. Convergence analysis of the DDDM is given,while an application approach to a weak nonlinear elliptic boundary value problem and a numerical experiment are discussed.展开更多
文摘This paper focuses on the design process for reconfigurable architecture. Our contribution focuses on introducing a new temporal partitioning algorithm. Our algorithm is based on typical mathematic flow to solve the temporal partitioning problem. This algorithm optimizes the transfer of data required between design partitions and the reconfiguration overhead. Results show that our algorithm considerably decreases the communication cost and the latency compared with other well known algorithms.
基金Supported by a grant from Iran Telecommunication Research Center
文摘Reconfigurable computing systems can be reconfigured at runtime and support partial reconfigurability which makes us able to execute tasks in a true multitasking manner. To manage such systems at runtime, a reconfigurable operating system is needed. The main part of this operating system is resource management unit which performs on-line scheduling and placement of hardware tasks at runtime. Reconfiguration overhead is an important obstacle that limits the performance of on-line scheduling algorithms in reconfigurable computing systems and increases the overall execution time. Configuration reusing (task reusing) can decrease reconfiguration overhead considerably, particularly in periodic applications or the applications in which the probability of tasks recurrence is high. In this paper, we present a technique called reusing-based scheduling (RBS), for on-line scheduling and placement in which configuration reusing is considered as a main characteristic in order to reduce reconfiguration overhead and decrease total execution time of the tasks. Several experiments have been conducted on the proposed algorithm. Obtained results show considerable improvement in overall execution time of the tasks.
基金supported in part by the National Science and Technology Major Project of the Ministry of Science and Technology of China (Grant No. 2018ZX01028201)in part by the National Natural Science Foundation of China (Grant No. 61672317, No. 61834002)in part by the National Key R&D Program of China (Grant No. 2018YFB2202101)
文摘As a computing paradigm that combines temporal and spatial computations,dynamic reconfigurable computing provides superiorities of flexibility,energy efficiency and area efficiency,attracting interest from both academia and industry.However,dynamic reconfigurable computing is not yet mature because of several unsolved problems.This work introduces the concept,architecture,and compilation techniques of dynamic reconfigurable computing.It also discusses the existing major challenges and points out its potential applications.
文摘This paper describes a new specialized Reconfigurable Cryptographic for Block ciphersArchitecture(RCBA).Application-specific computation pipelines can be configured according to thecharacteristics of the block cipher processing in RCBA,which delivers high performance for crypto-graphic applications.RCBA adopts a coarse-grained reconfigurable architecture that mixes the ap-propriate amount of static configurations with dynamic configurations.RCBA has been implementedbased on Altera’s FPGA,and representative algorithms of block cipher such as DES,Rijndael and RC6have been mapped on RCBA architecture successfully.System performance has been analyzed,andfrom the analysis it is demonstrated that the RCBA architecture can achieve more flexibility and ef-ficiency when compared with other implementations.
文摘The concept and advantage of reconfigurable technology is introduced. A kind of processor architecture of re configurable macro processor (RMP) model based on FPGA array and DSP is put forward and has been implemented. Two image algorithms are developed: template-based automatic target recognition and zone labeling. One is estimating for motion direction in the infrared image background, another is line picking-up algorithm based on image zone labeling and phase grouping technique. It is a kind of 'hardware' function that can be called by the DSP in high-level algorithm. It is also a kind of hardware algorithm of the DSP. The results of experiments show the reconfigurable computing technology based on RMP is an ideal accelerating means to deal with the high-speed image processing tasks. High real time performance is obtained in our two applications on RMP.
基金Supported by the National Natural Science Foundation of China (Grant No. 60633050, 60621003)the National High Technology Researchand Development Program of China (Grant No. 2007AA01Z06)
文摘Reconfigurable computing tries to achieve the balance between high efficiency of custom computing and flexibility of general-purpose computing. This paper presents the implementation techniques in LEAP, a coarse-grained reconfigurable array, and proposes a speculative execution mechanism for dynamic loop scheduling with the goal of one iteration per cycle and implementation techniques to support decoupling synchronization between the token generator and the collector. This paper also in- troduces the techniques of exploiting both data dependences of intra- and inter-iteration, with the help of two instructions for special data reuses in the loop-carried dependences. The experimental results show that the number of memory accesses reaches on average 3% of an RISC processor simulator with no memory optimization. In a practical image matching application, LEAP architecture achieves about 34 times of speedup in execution cycles, compared with general-purpose processors.
基金the National Key Research and Development Program of China(2019YFB1803600)the Key Scientific Research Program of Shaanxi Provincial Department of Education(22JY059)the China Civil Aviation Airworthiness Center Open Foundation(SH2021111903)。
文摘To apply a quasi-cyclic low density parity check(QC-LDPC)to different scenarios,a data-stream driven pipelined macro instruction set and a reconfigurable processor architecture are proposed for the typical QC-LDPC algorithm.The data-level parallelism is improved by instructions to dynamically configure the multi-core computing units.Simultaneously,an intelligent adjustment strategy based on a programmable wake-up controller(WuC)is designed so that the computing mode,operating voltage,and frequency of the QC-LDPC algorithm can be adjusted.This adjustment can improve the computing efficiency of the processor.The QC-LDPC processors are verified on the Xilinx ZCU102 field programmable gate array(FPGA)board and the computing efficiency is measured.The experimental results indicate that the QC-LDPC processor can support two encoding lengths of three typical QC-LDPC algorithms and 20 adaptive operating modes of operating voltage and frequency.The maximum efficiency can reach up to 12.18 Gbit/(s·W),which is more flexible than existing state-of-the-art processors for QC-LDPC.
基金funded by the US NSF Expedition in Computing Award CCF-0926127
文摘In this short article, we would like to introduce the Center for Domain-Specific Computing (CDSC) established in 2009, primarily funded by the US National Science Foundation with an award from the 2009 Expeditions in Computing Program. In this project we look beyond parallelization and focus on customization as the next disruptive technology to bring orders-of-magnitude power-performance efficiency improvement for applications in a specific domain.
基金supported by the National Natural Science Foundation of China(61272120,61602377,61634004)the Natural Science Foundation of Shaanxi Province of China(2015JM6326)+1 种基金Shaanxi Provincial Co-ordination Innovation Project of Science and Technology(2016KTZDGY02-04-02)the Project of Education Department of Shaanxi Provincial Government(15JK1683)
文摘In order to take into account the computing efficiency and flexibility of calculating transcendental functions, this paper proposes one kind of reconfigurable transcendental function generator. The generator is of a reconfigurable array structure composed of 30 processing elements (PEs). The coordinate rotational digital computer (CORDIC) algorithm is implemented on this structure. Different functions, such as sine, cosine, inverse tangent, logarithmic, etc., can be calculated based on the structure by reconfiguring the functions of PEs. The functional simulation and field programmable gate array (FPGA) verification show that the proposed method obtains great flexibility with acceptable performance.
基金Supported by the Foundation of National Defence Key Laboratory (51484020305JW1206)
文摘In this paper,domain decomposition method(DDM) for numerical solutions of mathematical physics equations is improved into dynamic domain decomposition method(DDDM) . The main feature of the DDDM is that the number,shape and volume of the sub-domains are all flexibly changeable during the iterations,so it suits well to be implemented on a reconfigurable parallel computing system. Convergence analysis of the DDDM is given,while an application approach to a weak nonlinear elliptic boundary value problem and a numerical experiment are discussed.