Allocation is one of main tasks in the high-level synthesis. It includes module , functional unit allocation, storage allocation and interconnection allocation. This paper models the allocation problem as cluster anal...Allocation is one of main tasks in the high-level synthesis. It includes module , functional unit allocation, storage allocation and interconnection allocation. This paper models the allocation problem as cluster analysis and applies a new algorithm, neighbor state transition (NST) algorithm, for cluster optimization. It is proved that the algorithm produces an asymptotically global optimal solution with the upper bound on the cost function (1 + O(1/n)2-ε)F*, When F" is the cost of the optimum solution, n is the problem size and e is a positive parameter arbitrarily close to zero. The numerical examples show that the NST algorithm produces better results compared to the other known methods.展开更多
On board processing(OBP) satellite systems have obtained more and more attentions in recent years because of their high efficiency and performance.However,the OBP transponders are very sensitive to the high energy par...On board processing(OBP) satellite systems have obtained more and more attentions in recent years because of their high efficiency and performance.However,the OBP transponders are very sensitive to the high energy particles in the space radiation environments.Single event upset(SEU)is one of the major radiation effects,which influences the satellite reliability greatly.Triple modular redundancy(TMR) is a classic and efficient method to mask SEUs.However,TMR uses three identical modules and a comparison logic,the circuit size becomes unacceptable,especially in the resource limited environments such as OBP systems.Considering that,a new SEU-tolerant method based on residue code and high-level synthesis(HLS) is proposed,and the new method is applied to FIR filters,which are typical structures in the OBP systems.The simulation results show that,for an applicable HLS scheduling scheme,area reduction can be reduced by 48.26%compared to TMR,while fault missing rate is 0.15%.展开更多
Strontium titanate synroc samples were synthesized by self-propagating high-temperature synthesis (SHS). Sr directly took part in the synthesis process. As a result, the loading content issue is basically resolved. ...Strontium titanate synroc samples were synthesized by self-propagating high-temperature synthesis (SHS). Sr directly took part in the synthesis process. As a result, the loading content issue is basically resolved. The products were characterized by density, microhardness X-ray diffraction, and scanning electron microscopy (SEM/EDS). The leaching rate was measured by the method of PCT (product consistency test). The results indicate that the Sr^2+-SrTiO3 compound is of high density, low leach rate and high stability and the synthesis process is feasible in technology and economy. It can be concluded that the strontium titanate synroc is a perfect material to immobilize HLW.展开更多
This paper studies the linkage problem between the result of high-level synthesis and back-end technology, presents a method of high-level technology mapping based on knowl edge, and studies deeply all of its importan...This paper studies the linkage problem between the result of high-level synthesis and back-end technology, presents a method of high-level technology mapping based on knowl edge, and studies deeply all of its important links such as knowledge representation, knowledge utility and knowledge acquisition. It includes: (1) present a kind of expanded production about knowledge of circuit structure; (2) present a VHDL-based method to acquire knowledge of tech nology mapping; (3) provide solution control strategy and algorithm of knowledge utility; (4)present a half-automatic maintenance method, which can find redundance and contradiction of knowledge base; (5) present a practical method to embed the algorithm into knowledge system to decrease complexity of knowledge base. A system has been developed and linked with three kinds of technologies, so verified the work of this paper.展开更多
Register allocation in high-level circuit synthesis is important not only for reducing area, delay, and power overheads, but also for improving the testability of the synthesized circuits. This paper presents an impro...Register allocation in high-level circuit synthesis is important not only for reducing area, delay, and power overheads, but also for improving the testability of the synthesized circuits. This paper presents an improved register allocation algorithm that improves the testability called weighted graph-based balanced register allocation for high-level circuit synthesis. The controllability and observability of the registers and the self-loop elimination are analyzed to form a weighted conflict graph, where the weight of the edge between two nodes denotes the tendency of the two variables to share the same register. Then the modified desaturation algorithm is used to dynamically modify the weights to obtain a final balanced register allocation which improves the testability of the synthesized circuits a higher fault coverage than other algorithms with Tests on some benchmarks show that the algorithm gives less area overhead and even less time delay.展开更多
This paper describes a VHDL high-level synthesis system HLS/BIT with emphasis on its register-transfer level (RTL) binding and technology mapping subsystem. In more detail, the component instantiation mechanism and th...This paper describes a VHDL high-level synthesis system HLS/BIT with emphasis on its register-transfer level (RTL) binding and technology mapping subsystem. In more detail, the component instantiation mechanism and the knowledge-driven approach to RTL technology mapping are also presented.展开更多
Scheduling is an important step in high-level synthesis and can greatly influence the testability of the synthesized circuits. This paper presents an efficient testability-improved data path scheduling scheme based on...Scheduling is an important step in high-level synthesis and can greatly influence the testability of the synthesized circuits. This paper presents an efficient testability-improved data path scheduling scheme based on mobility scheduling, in which the scheduling begins from the operation with least mobility. In our data path scheduling scheme, the lifetimes of the I/O variables are made as short as possible to enlarge the possibility of the intermediate variables being allocated to the I/O registers. In this way, the controllability/observability of the intermediate variables can be improved. Combined with a weighted graph-based register allocation method, this scheme can obtain better testability. Experimental results on some benchmarks and example circuits show that the proposed scheme can get higher fault coverage compared with other scheduling schemes at little area overhead and even less time delay.展开更多
Field-programmable gate arrays(FPGAs)have recently evolved as a valuable component of the heterogeneous computing.The register transfer level(RTL)design flows demand the designers to be experienced in hardware,resulti...Field-programmable gate arrays(FPGAs)have recently evolved as a valuable component of the heterogeneous computing.The register transfer level(RTL)design flows demand the designers to be experienced in hardware,resulting in a possible failure of time-to-market.High-level synthesis(HLS)permits designers to work at a higher level of abstraction through synthesizing high-level language programs to RTL descriptions.This provides a promising approach to solve these problems.However,the performance of HLS tools still has limitations.For example,designers remain exposed to various aspects of hardware design,development cycles are still time consuming,and the quality of results(QoR)of HLS tools is far behind that of RTL flows.In this paper,we survey the literature published since 2014 focusing on the performance optimization of HLS tools.Compared with previous work,we extend the scope of the performance of HLS tools,and present a set of three-level evaluation criteria,covering from ease of use of the HLS tools to promotion on specific metrics of QoR.We also propose performance evaluation equations for describing the relation between the performance optimization and the QoR.We find that it needs more efforts on the ease of use for efficient HLS tools.We suggest that it is better to draw an analogy between the HLS development process and the embedded system design process,and to provide more elastic HLS methodology which integrates FPGAs virtual machines.展开更多
Low-Density Parity-heck Codes(LDPC)with excellent error-correction capabilities have been widely used in both data communication and storage fields,to construct reliable cyber-physical systems that are resilient to re...Low-Density Parity-heck Codes(LDPC)with excellent error-correction capabilities have been widely used in both data communication and storage fields,to construct reliable cyber-physical systems that are resilient to real-world noises.Fast prototyping field-programmable gate array(FPGA)-based decoder is essential to achieve high decoding performance while accelerating the development process.This paper proposes a three-level parallel architecture,TLP-LDPC,to achieve high throughput by fully exploiting the characteristics of both LDPC and underlying hardware while effectively scaling to large-size FPGA platforms.The three-level parallel architecture contains a low-level decoding unit,a mid-level multi-unit decoding core,and a high-level multi-core decoder.The low-level decoding unit is a basic LDPC computation component that effectively combines the features of the LDPC algorithm and hardware with the specific structure(e.g.,Look-Up-Table,LUT)of the FPGA and eliminates potential data conflicts.The mid-level decoding core integrates the input/output and multiple decoding units in a well-balancing pipelined fashion.The top-level multi-core architecture conveniently makes full use of board-level resources to improve the overall throughput.We develop an LDPC C++code with dedicated pragmas and leverage HLS tools to implement the TLP-LDPC architecture.Experimental results show that TLP-LDPC achieves 9.63 Gbps end-to-end decoding throughput on a Xilinx Alveo U50 platform,3.9x higher than existing HLS-based FPGA implementations.展开更多
Area and test time are two major overheads encountered duringdata path high level synthesis for BIST. This paper presents an approach to behavioral synthesis for loop-based BIST. By taking into account the requirement...Area and test time are two major overheads encountered duringdata path high level synthesis for BIST. This paper presents an approach to behavioral synthesis for loop-based BIST. By taking into account the requirements of theBIST scheme during behavioral synthesis processes, an area optimal BIST solutioncan be obtained. This approach is based on the use of test resources reusabilitythat results in a fewer number of registers being modified to be test registers. Thisis achieved by incorporating self-testability constraints during register assignmentoperations. Experimental results on benchmarks are presented to demonstrate theeffectiveness of the approach.展开更多
With the development of VLSI technology, testing design method has been one indispensable fact of the research of VLSI design methodology. Technology of testing design can observably reduce the cost of the chip and he...With the development of VLSI technology, testing design method has been one indispensable fact of the research of VLSI design methodology. Technology of testing design can observably reduce the cost of the chip and help win in time- to-market. Conventional methods of design for test improve the testing performance of the system by modifing the gate-level architecture generally. Re-search in high-level test synthesis has been emphasized on fitting for the trend of high-level VLSI design. In this paper, based on the analysis of different types of testing design methods, a novel compound strategy of design for test in VLSI system is proposed.展开更多
As the feature size of integrated circuits is reduced to the deep sub-micron level or the nanometer level, the interconnect delay is becoming more and more important in determining the total delay of a circuit. Re-syn...As the feature size of integrated circuits is reduced to the deep sub-micron level or the nanometer level, the interconnect delay is becoming more and more important in determining the total delay of a circuit. Re-synthesis after floorplan is expected to be very helpful for reducing the interconnect delay of a circuit. In this paper, a force-balance-based re-synthesis algorithm for interconnect delay optimization after floorplan is proposed. The algorithm optimizes the interconnect delay by changing the operation scheduling and the functional unit allocation and binding. With this method the number and positions of all functional units are not changed, but some operations are allocated or bound to different units. Preliminary experimental results show that the interconnect wire delays are reduced efficiently without destroying the floorplan performance.展开更多
Scheduling chain combination is the core of chain-based scheduling algorithms, the speed of which determines the overall performance of corresponding scheduling algorithm. However, backtracking is used in general comb...Scheduling chain combination is the core of chain-based scheduling algorithms, the speed of which determines the overall performance of corresponding scheduling algorithm. However, backtracking is used in general combination algorithms to traverse the whole search space which may introduce redundant operations, so performance of the combination algorithm is generally poor. A fast scheduling chain combination algorithm which avoids redundant operations by skipping “incompatible” steps of scheduling chains and using a stack to remember the scheduling state is presented in this paper to overcome the problem. Experimental results showed that it can improve the performance of scheduling algorithms by up to 15 times. By further omitting unnecessary operations, a fast algorithm of minimum combination length prediction is developed, which can improve the speed by up to 10 times.展开更多
文摘Allocation is one of main tasks in the high-level synthesis. It includes module , functional unit allocation, storage allocation and interconnection allocation. This paper models the allocation problem as cluster analysis and applies a new algorithm, neighbor state transition (NST) algorithm, for cluster optimization. It is proved that the algorithm produces an asymptotically global optimal solution with the upper bound on the cost function (1 + O(1/n)2-ε)F*, When F" is the cost of the optimum solution, n is the problem size and e is a positive parameter arbitrarily close to zero. The numerical examples show that the NST algorithm produces better results compared to the other known methods.
基金Supported by the National S&T Major Project(No.2011ZX03003-003-01,2011ZX03004-004)the National Basic Research Program of China(No.2012CB316002)
文摘On board processing(OBP) satellite systems have obtained more and more attentions in recent years because of their high efficiency and performance.However,the OBP transponders are very sensitive to the high energy particles in the space radiation environments.Single event upset(SEU)is one of the major radiation effects,which influences the satellite reliability greatly.Triple modular redundancy(TMR) is a classic and efficient method to mask SEUs.However,TMR uses three identical modules and a comparison logic,the circuit size becomes unacceptable,especially in the resource limited environments such as OBP systems.Considering that,a new SEU-tolerant method based on residue code and high-level synthesis(HLS) is proposed,and the new method is applied to FIR filters,which are typical structures in the OBP systems.The simulation results show that,for an applicable HLS scheduling scheme,area reduction can be reduced by 48.26%compared to TMR,while fault missing rate is 0.15%.
基金This work was financially supported by the National Natural Science Foundation of China (No.20476008).
文摘Strontium titanate synroc samples were synthesized by self-propagating high-temperature synthesis (SHS). Sr directly took part in the synthesis process. As a result, the loading content issue is basically resolved. The products were characterized by density, microhardness X-ray diffraction, and scanning electron microscopy (SEM/EDS). The leaching rate was measured by the method of PCT (product consistency test). The results indicate that the Sr^2+-SrTiO3 compound is of high density, low leach rate and high stability and the synthesis process is feasible in technology and economy. It can be concluded that the strontium titanate synroc is a perfect material to immobilize HLW.
文摘This paper studies the linkage problem between the result of high-level synthesis and back-end technology, presents a method of high-level technology mapping based on knowl edge, and studies deeply all of its important links such as knowledge representation, knowledge utility and knowledge acquisition. It includes: (1) present a kind of expanded production about knowledge of circuit structure; (2) present a VHDL-based method to acquire knowledge of tech nology mapping; (3) provide solution control strategy and algorithm of knowledge utility; (4)present a half-automatic maintenance method, which can find redundance and contradiction of knowledge base; (5) present a practical method to embed the algorithm into knowledge system to decrease complexity of knowledge base. A system has been developed and linked with three kinds of technologies, so verified the work of this paper.
基金Supported by the National Key Basic Research and Development(973) Program of China (No. 2005CB321604)the National Natural Science Foundation of China (No. 60633060)
文摘Register allocation in high-level circuit synthesis is important not only for reducing area, delay, and power overheads, but also for improving the testability of the synthesized circuits. This paper presents an improved register allocation algorithm that improves the testability called weighted graph-based balanced register allocation for high-level circuit synthesis. The controllability and observability of the registers and the self-loop elimination are analyzed to form a weighted conflict graph, where the weight of the edge between two nodes denotes the tendency of the two variables to share the same register. Then the modified desaturation algorithm is used to dynamically modify the weights to obtain a final balanced register allocation which improves the testability of the synthesized circuits a higher fault coverage than other algorithms with Tests on some benchmarks show that the algorithm gives less area overhead and even less time delay.
文摘This paper describes a VHDL high-level synthesis system HLS/BIT with emphasis on its register-transfer level (RTL) binding and technology mapping subsystem. In more detail, the component instantiation mechanism and the knowledge-driven approach to RTL technology mapping are also presented.
基金the National Key Basic Research and Development (973) of China (No. 2005CB321604)the National Natural Science Foundation of China (No. 60633060)
文摘Scheduling is an important step in high-level synthesis and can greatly influence the testability of the synthesized circuits. This paper presents an efficient testability-improved data path scheduling scheme based on mobility scheduling, in which the scheduling begins from the operation with least mobility. In our data path scheduling scheme, the lifetimes of the I/O variables are made as short as possible to enlarge the possibility of the intermediate variables being allocated to the I/O registers. In this way, the controllability/observability of the intermediate variables can be improved. Combined with a weighted graph-based register allocation method, this scheme can obtain better testability. Experimental results on some benchmarks and example circuits show that the proposed scheme can get higher fault coverage compared with other scheduling schemes at little area overhead and even less time delay.
基金distinguished member of CCF.Supported by:This work was supported by the National Natural Science Foundation of China under Grant No.61772227the Development Project of Jilin Province of China under Grant Nos.20190201273JC and 2020C003+1 种基金Guangdong Key Project for Applied Fundamental Research under Grant No.2018KZDXM076Jilin Provincial Key Laboratory of Big Date Intelligent Computing under Grant No.20180622002JC.
文摘Field-programmable gate arrays(FPGAs)have recently evolved as a valuable component of the heterogeneous computing.The register transfer level(RTL)design flows demand the designers to be experienced in hardware,resulting in a possible failure of time-to-market.High-level synthesis(HLS)permits designers to work at a higher level of abstraction through synthesizing high-level language programs to RTL descriptions.This provides a promising approach to solve these problems.However,the performance of HLS tools still has limitations.For example,designers remain exposed to various aspects of hardware design,development cycles are still time consuming,and the quality of results(QoR)of HLS tools is far behind that of RTL flows.In this paper,we survey the literature published since 2014 focusing on the performance optimization of HLS tools.Compared with previous work,we extend the scope of the performance of HLS tools,and present a set of three-level evaluation criteria,covering from ease of use of the HLS tools to promotion on specific metrics of QoR.We also propose performance evaluation equations for describing the relation between the performance optimization and the QoR.We find that it needs more efforts on the ease of use for efficient HLS tools.We suggest that it is better to draw an analogy between the HLS development process and the embedded system design process,and to provide more elastic HLS methodology which integrates FPGAs virtual machines.
基金the National Key Research and Development Program of China under Grant No.2018YF-A0701800the National Natural Science Foundation of China under Grant Nos.61821003 and 62172175,and Alibaba Group through Alibaba Innovative Research(AIR)Program.
文摘Low-Density Parity-heck Codes(LDPC)with excellent error-correction capabilities have been widely used in both data communication and storage fields,to construct reliable cyber-physical systems that are resilient to real-world noises.Fast prototyping field-programmable gate array(FPGA)-based decoder is essential to achieve high decoding performance while accelerating the development process.This paper proposes a three-level parallel architecture,TLP-LDPC,to achieve high throughput by fully exploiting the characteristics of both LDPC and underlying hardware while effectively scaling to large-size FPGA platforms.The three-level parallel architecture contains a low-level decoding unit,a mid-level multi-unit decoding core,and a high-level multi-core decoder.The low-level decoding unit is a basic LDPC computation component that effectively combines the features of the LDPC algorithm and hardware with the specific structure(e.g.,Look-Up-Table,LUT)of the FPGA and eliminates potential data conflicts.The mid-level decoding core integrates the input/output and multiple decoding units in a well-balancing pipelined fashion.The top-level multi-core architecture conveniently makes full use of board-level resources to improve the overall throughput.We develop an LDPC C++code with dedicated pragmas and leverage HLS tools to implement the TLP-LDPC architecture.Experimental results show that TLP-LDPC achieves 9.63 Gbps end-to-end decoding throughput on a Xilinx Alveo U50 platform,3.9x higher than existing HLS-based FPGA implementations.
文摘Area and test time are two major overheads encountered duringdata path high level synthesis for BIST. This paper presents an approach to behavioral synthesis for loop-based BIST. By taking into account the requirements of theBIST scheme during behavioral synthesis processes, an area optimal BIST solutioncan be obtained. This approach is based on the use of test resources reusabilitythat results in a fewer number of registers being modified to be test registers. Thisis achieved by incorporating self-testability constraints during register assignmentoperations. Experimental results on benchmarks are presented to demonstrate theeffectiveness of the approach.
文摘With the development of VLSI technology, testing design method has been one indispensable fact of the research of VLSI design methodology. Technology of testing design can observably reduce the cost of the chip and help win in time- to-market. Conventional methods of design for test improve the testing performance of the system by modifing the gate-level architecture generally. Re-search in high-level test synthesis has been emphasized on fitting for the trend of high-level VLSI design. In this paper, based on the analysis of different types of testing design methods, a novel compound strategy of design for test in VLSI system is proposed.
基金the National Natural Science Foundation of China (Nos. 90407005, 90207017, 60236020, and 60121120706)
文摘As the feature size of integrated circuits is reduced to the deep sub-micron level or the nanometer level, the interconnect delay is becoming more and more important in determining the total delay of a circuit. Re-synthesis after floorplan is expected to be very helpful for reducing the interconnect delay of a circuit. In this paper, a force-balance-based re-synthesis algorithm for interconnect delay optimization after floorplan is proposed. The algorithm optimizes the interconnect delay by changing the operation scheduling and the functional unit allocation and binding. With this method the number and positions of all functional units are not changed, but some operations are allocated or bound to different units. Preliminary experimental results show that the interconnect wire delays are reduced efficiently without destroying the floorplan performance.
基金Project (No. Y105355) supported by the Natural Science Foundationof Zhejiang Province, China
文摘Scheduling chain combination is the core of chain-based scheduling algorithms, the speed of which determines the overall performance of corresponding scheduling algorithm. However, backtracking is used in general combination algorithms to traverse the whole search space which may introduce redundant operations, so performance of the combination algorithm is generally poor. A fast scheduling chain combination algorithm which avoids redundant operations by skipping “incompatible” steps of scheduling chains and using a stack to remember the scheduling state is presented in this paper to overcome the problem. Experimental results showed that it can improve the performance of scheduling algorithms by up to 15 times. By further omitting unnecessary operations, a fast algorithm of minimum combination length prediction is developed, which can improve the speed by up to 10 times.