Microprocessor development emphasizes hardware and software co design. Hw/Sw co design is a modern technique aimed at shortening the time to market in designing the real time and embedded systems. Key feature of this ...Microprocessor development emphasizes hardware and software co design. Hw/Sw co design is a modern technique aimed at shortening the time to market in designing the real time and embedded systems. Key feature of this approach is simultaneous development of the program tools and the target processor to match software application. An effective co design flow must therefore support automatic software toolkits generation, without loss of optimizing efficiency. This has resulted in a paradigm shift towards a language based design methodology for microprocessor optimization and exploration. This paper proposes a formal grammar, UNI SPEC, which supports the automatic generation of assemblers, to describe the translation rules from assembly to binary. Based on UNI SPEC, it implements two typical applications, i.e., automatically generating the assembler and the test suites.展开更多
A Taylor series expansion(TSE) based design for minimum mean-square error(MMSE) and QR decomposition(QRD) of multi-input and multi-output(MIMO) systems is proposed based on application specific instruction set process...A Taylor series expansion(TSE) based design for minimum mean-square error(MMSE) and QR decomposition(QRD) of multi-input and multi-output(MIMO) systems is proposed based on application specific instruction set processor(ASIP), which uses TSE algorithm instead of resource-consuming reciprocal and reciprocal square root(RSR) operations.The aim is to give a high performance implementation for MMSE and QRD in one programmable platform simultaneously.Furthermore, instruction set architecture(ISA) and the allocation of data paths in single instruction multiple data-very long instruction word(SIMD-VLIW) architecture are provided, offering more data parallelism and instruction parallelism for different dimension matrices and operation types.Meanwhile, multiple level numerical precision can be achieved with flexible table size and expansion order in TSE ISA.The ASIP has been implemented to a 28 nm CMOS process and frequency reaches 800 MHz.Experimental results show that the proposed design provides perfect numerical precision within the fixed bit-width of the ASIP, higher matrix processing rate better than the requirements of 5G system and more rate-area efficiency comparable with ASIC implementations.展开更多
The 32-bit extensible embedded processor RISC3200 originating from an RTL prototype core is intended for low-cost consumer multimedia products. In order to incorporate the reduced instruction set and the multimedia ex...The 32-bit extensible embedded processor RISC3200 originating from an RTL prototype core is intended for low-cost consumer multimedia products. In order to incorporate the reduced instruction set and the multimedia extension instruction set in a unifying pipeline, a scalable super-pipeline technique is adopted. Several other optimization techniques are proposed to boost the frequency and reduce the average CPI of the unifying pipeline. Based on a data flow graph (DFG) with delay information, the critical path of the pipeline stage can be located and shortened. This paper presents a distributed data bypass unit and a centralized pipeline control scheme for achieving lower CPI. Synthesis and simulation showed that the optimization techniques enable RISC3200 to operate at 200 MHz with an average CPI of 1.16. The core was integrated into a media SOC chip taped out in SMIC 0.18-micron technology. Preliminary testing result showed that the processor works well as we expected.展开更多
As an important branch of information security algorithms,the efficient and flexible implementation of stream ciphers is vital.Existing implementation methods,such as FPGA,GPP and ASIC,provide a good support,but they ...As an important branch of information security algorithms,the efficient and flexible implementation of stream ciphers is vital.Existing implementation methods,such as FPGA,GPP and ASIC,provide a good support,but they could not achieve a better tradeoff between high speed processing and high flexibility.ASIC has fast processing speed,but its flexibility is poor,GPP has high flexibility,but the processing speed is slow,FPGA has high flexibility and processing speed,but the resource utilization is very low.This paper studies a stream cryptographic processor which can efficiently and flexibly implement a variety of stream cipher algorithms.By analyzing the structure model,processing characteristics and storage characteristics of stream ciphers,a reconfigurable stream cryptographic processor with special instructions based on VLIW is presented,which has separate/cluster storage structure and is oriented to stream cipher operations.The proposed instruction structure can effectively support stream cipher processing with multiple data bit widths,parallelism among stream cipher processing with different data bit widths,and parallelism among branch control and stream cipher processing with high instruction level parallelism;the designed separate/clustered special bit registers and general register heaps,key register heaps can satisfy cryptographic requirements.So the proposed processor not only flexibly accomplishes the combination of multiple basic stream cipher operations to finish stream cipher algorithms.It has been implemented with 0.18μm CMOS technology,the test results show that the frequency can reach 200 MHz,and power consumption is 310 mw.Ten kinds of stream ciphers were realized in the processor.The key stream generation throughput of Grain-80,W7,MICKEY,ACHTERBAHN and Shrink algorithm is 100 Mbps,66.67 Mbps,66.67 Mbps,50 Mbps and 800 Mbps,respectively.The test result shows that the processor presented can achieve good tradeoff between high performance and flexibility of stream ciphers.展开更多
This paper illustrates the importance of the configuration of function units and the change of an application’s critical path when using instruction set extension (ISE) with multi-issue architectures. This paper al...This paper illustrates the importance of the configuration of function units and the change of an application’s critical path when using instruction set extension (ISE) with multi-issue architectures. This paper also presents an automatic identification approach for customized instruction without input/output number constraints for multi-issue architectures. The approach identifies customized instructions using multiple attribute decision-making based on the analysis of several attributes for each candidate node. Tests indicate that the approach achieves higher speedup ratios than previous approaches, as well as less area cost. In addition, this approach provides designers with multiple candidate designs.展开更多
文摘Microprocessor development emphasizes hardware and software co design. Hw/Sw co design is a modern technique aimed at shortening the time to market in designing the real time and embedded systems. Key feature of this approach is simultaneous development of the program tools and the target processor to match software application. An effective co design flow must therefore support automatic software toolkits generation, without loss of optimizing efficiency. This has resulted in a paradigm shift towards a language based design methodology for microprocessor optimization and exploration. This paper proposes a formal grammar, UNI SPEC, which supports the automatic generation of assemblers, to describe the translation rules from assembly to binary. Based on UNI SPEC, it implements two typical applications, i.e., automatically generating the assembler and the test suites.
基金Supported by the Industrial Internet Innovation and Development Project of Ministry of Industry and Information Technology (No.GHBJ2004)。
文摘A Taylor series expansion(TSE) based design for minimum mean-square error(MMSE) and QR decomposition(QRD) of multi-input and multi-output(MIMO) systems is proposed based on application specific instruction set processor(ASIP), which uses TSE algorithm instead of resource-consuming reciprocal and reciprocal square root(RSR) operations.The aim is to give a high performance implementation for MMSE and QRD in one programmable platform simultaneously.Furthermore, instruction set architecture(ISA) and the allocation of data paths in single instruction multiple data-very long instruction word(SIMD-VLIW) architecture are provided, offering more data parallelism and instruction parallelism for different dimension matrices and operation types.Meanwhile, multiple level numerical precision can be achieved with flexible table size and expansion order in TSE ISA.The ASIP has been implemented to a 28 nm CMOS process and frequency reaches 800 MHz.Experimental results show that the proposed design provides perfect numerical precision within the fixed bit-width of the ASIP, higher matrix processing rate better than the requirements of 5G system and more rate-area efficiency comparable with ASIC implementations.
基金Project supported by the Hi-Tech Research and Development Pro-gram (863) of China (No. 2002 AA1Z1140) and the Fork Ying TongEducation Foundation (No. 94031), China
文摘The 32-bit extensible embedded processor RISC3200 originating from an RTL prototype core is intended for low-cost consumer multimedia products. In order to incorporate the reduced instruction set and the multimedia extension instruction set in a unifying pipeline, a scalable super-pipeline technique is adopted. Several other optimization techniques are proposed to boost the frequency and reduce the average CPI of the unifying pipeline. Based on a data flow graph (DFG) with delay information, the critical path of the pipeline stage can be located and shortened. This paper presents a distributed data bypass unit and a centralized pipeline control scheme for achieving lower CPI. Synthesis and simulation showed that the optimization techniques enable RISC3200 to operate at 200 MHz with an average CPI of 1.16. The core was integrated into a media SOC chip taped out in SMIC 0.18-micron technology. Preliminary testing result showed that the processor works well as we expected.
基金supported by National Natural Science Foundation of China with granted No.61404175
文摘As an important branch of information security algorithms,the efficient and flexible implementation of stream ciphers is vital.Existing implementation methods,such as FPGA,GPP and ASIC,provide a good support,but they could not achieve a better tradeoff between high speed processing and high flexibility.ASIC has fast processing speed,but its flexibility is poor,GPP has high flexibility,but the processing speed is slow,FPGA has high flexibility and processing speed,but the resource utilization is very low.This paper studies a stream cryptographic processor which can efficiently and flexibly implement a variety of stream cipher algorithms.By analyzing the structure model,processing characteristics and storage characteristics of stream ciphers,a reconfigurable stream cryptographic processor with special instructions based on VLIW is presented,which has separate/cluster storage structure and is oriented to stream cipher operations.The proposed instruction structure can effectively support stream cipher processing with multiple data bit widths,parallelism among stream cipher processing with different data bit widths,and parallelism among branch control and stream cipher processing with high instruction level parallelism;the designed separate/clustered special bit registers and general register heaps,key register heaps can satisfy cryptographic requirements.So the proposed processor not only flexibly accomplishes the combination of multiple basic stream cipher operations to finish stream cipher algorithms.It has been implemented with 0.18μm CMOS technology,the test results show that the frequency can reach 200 MHz,and power consumption is 310 mw.Ten kinds of stream ciphers were realized in the processor.The key stream generation throughput of Grain-80,W7,MICKEY,ACHTERBAHN and Shrink algorithm is 100 Mbps,66.67 Mbps,66.67 Mbps,50 Mbps and 800 Mbps,respectively.The test result shows that the processor presented can achieve good tradeoff between high performance and flexibility of stream ciphers.
基金Supported by the Basic Research Fund of Tsinghua University
文摘This paper illustrates the importance of the configuration of function units and the change of an application’s critical path when using instruction set extension (ISE) with multi-issue architectures. This paper also presents an automatic identification approach for customized instruction without input/output number constraints for multi-issue architectures. The approach identifies customized instructions using multiple attribute decision-making based on the analysis of several attributes for each candidate node. Tests indicate that the approach achieves higher speedup ratios than previous approaches, as well as less area cost. In addition, this approach provides designers with multiple candidate designs.