In order to gain the great performance of ASIP,this paper discusses different aspects of an ASIP instruction set specification like syntax,encoding,constraints as well as behaviors,and introduces our ADL model based m...In order to gain the great performance of ASIP,this paper discusses different aspects of an ASIP instruction set specification like syntax,encoding,constraints as well as behaviors,and introduces our ADL model based methodology to check them.The automatic generation of test cases based on our straight-forward instruction representation is shown,and the efficient generation of them with good coverage is shown as well.The verification of the constraint checker,a very important tool for programmer,is performed.Results show that the toolkit can find some errors in previous delivery tools,and the introduced methodology verifies the feasibility of our instruction set specification.展开更多
A Taylor series expansion(TSE) based design for minimum mean-square error(MMSE) and QR decomposition(QRD) of multi-input and multi-output(MIMO) systems is proposed based on application specific instruction set process...A Taylor series expansion(TSE) based design for minimum mean-square error(MMSE) and QR decomposition(QRD) of multi-input and multi-output(MIMO) systems is proposed based on application specific instruction set processor(ASIP), which uses TSE algorithm instead of resource-consuming reciprocal and reciprocal square root(RSR) operations.The aim is to give a high performance implementation for MMSE and QRD in one programmable platform simultaneously.Furthermore, instruction set architecture(ISA) and the allocation of data paths in single instruction multiple data-very long instruction word(SIMD-VLIW) architecture are provided, offering more data parallelism and instruction parallelism for different dimension matrices and operation types.Meanwhile, multiple level numerical precision can be achieved with flexible table size and expansion order in TSE ISA.The ASIP has been implemented to a 28 nm CMOS process and frequency reaches 800 MHz.Experimental results show that the proposed design provides perfect numerical precision within the fixed bit-width of the ASIP, higher matrix processing rate better than the requirements of 5G system and more rate-area efficiency comparable with ASIC implementations.展开更多
嵌入式系统的应用多样性和设计时效性特征对专用指令集处理器(application specific instructure setprocessor,ASIP)体系结构设计提出了挑战。提出一种ASIP设计平台A2IDE,它将ASIP的系统级设计任务划分为指令集、流水线和微结构三个层...嵌入式系统的应用多样性和设计时效性特征对专用指令集处理器(application specific instructure setprocessor,ASIP)体系结构设计提出了挑战。提出一种ASIP设计平台A2IDE,它将ASIP的系统级设计任务划分为指令集、流水线和微结构三个层次,并采用体系结构描述语言驱动软件工具集自动生成和各层次上的设计空间搜索。对A2IDE的特点、架构进行了描述,并通过实验初步证明了A2IDE平台的有效性。展开更多
An Efficient and flexible implementation of block ciphers is critical to achieve information security processing.Existing implementation methods such as GPP,FPGA and cryptographic application-specific ASIC provide the...An Efficient and flexible implementation of block ciphers is critical to achieve information security processing.Existing implementation methods such as GPP,FPGA and cryptographic application-specific ASIC provide the broad range of support.However,these methods could not achieve a good tradeoff between high-speed processing and flexibility.In this paper,we present a reconfigurable VLIW processor architecture targeted at block cipher processing,analyze basic operations and storage characteristics,and propose the multi-cluster register-file structure for block ciphers.As for the same operation element of block ciphers,we adopt reconfigurable technology for multiple cryptographic processing units and interconnection scheme.The proposed processor not only flexibly accomplishes the combination of multiple basic cryptographic operations,but also realizes dynamic configuration for cryptographic processing units.It has been implemented with0.18μm CMOS technology,the test results show that the frequency can reach 350 MHz.and power consumption is 420 mw.Ten kinds of block and hash ciphers were realized in the processor.The encryption throughput of AES,DES,IDEA,and SHA-1 algorithm is1554 Mbps,448Mbps,785 Mbps,and 424 Mbps respectively,the test result shows that our processor's encryption performance is significantly higher than other designs.展开更多
As an important branch of information security algorithms,the efficient and flexible implementation of stream ciphers is vital.Existing implementation methods,such as FPGA,GPP and ASIC,provide a good support,but they ...As an important branch of information security algorithms,the efficient and flexible implementation of stream ciphers is vital.Existing implementation methods,such as FPGA,GPP and ASIC,provide a good support,but they could not achieve a better tradeoff between high speed processing and high flexibility.ASIC has fast processing speed,but its flexibility is poor,GPP has high flexibility,but the processing speed is slow,FPGA has high flexibility and processing speed,but the resource utilization is very low.This paper studies a stream cryptographic processor which can efficiently and flexibly implement a variety of stream cipher algorithms.By analyzing the structure model,processing characteristics and storage characteristics of stream ciphers,a reconfigurable stream cryptographic processor with special instructions based on VLIW is presented,which has separate/cluster storage structure and is oriented to stream cipher operations.The proposed instruction structure can effectively support stream cipher processing with multiple data bit widths,parallelism among stream cipher processing with different data bit widths,and parallelism among branch control and stream cipher processing with high instruction level parallelism;the designed separate/clustered special bit registers and general register heaps,key register heaps can satisfy cryptographic requirements.So the proposed processor not only flexibly accomplishes the combination of multiple basic stream cipher operations to finish stream cipher algorithms.It has been implemented with 0.18μm CMOS technology,the test results show that the frequency can reach 200 MHz,and power consumption is 310 mw.Ten kinds of stream ciphers were realized in the processor.The key stream generation throughput of Grain-80,W7,MICKEY,ACHTERBAHN and Shrink algorithm is 100 Mbps,66.67 Mbps,66.67 Mbps,50 Mbps and 800 Mbps,respectively.The test result shows that the processor presented can achieve good tradeoff between high performance and flexibility of stream ciphers.展开更多
The requirement of the flexible and effective implementation of the Elliptic Curve Cryptography (ECC) has become more and more exigent since its dominant position in the public-key cryptography application.Based on an...The requirement of the flexible and effective implementation of the Elliptic Curve Cryptography (ECC) has become more and more exigent since its dominant position in the public-key cryptography application.Based on analyzing the basic structure features of Elliptic Curve Cryptography (ECC) algorithms,the parallel schedule algorithm of point addition and doubling is presented.And based on parallel schedule algorithm,the Application Specific Instruction-Set Co-Processor of ECC that adopting VLIW architecture is also proposed in this paper.The coprocessor for ECC is implemented and validated using Altera’s FPGA.The experimental result shows that our proposed coprocessor has advantage in high performance and flexibility.展开更多
With greater flexibility and less cost, there is a trend that application specific instruction set processor(ASIP) will become the alternative implementation style to application of specific integrated circuit(ASIC). ...With greater flexibility and less cost, there is a trend that application specific instruction set processor(ASIP) will become the alternative implementation style to application of specific integrated circuit(ASIC). Architecture model is a key component in ASIP design flow. A novel ASIP model, xpMODEL, was presented. Its key features include: explicit specification of the memory subsystem allowing novel memory organizations and hierarchies; the introduction of meta-operator and instruction behavior extended finite state machine providing xpMODEL with ability to model execution sequencing, inherent parallelism, data/control/structural hazards, and out-of-order execution mode in ASIP. A comparison with other ASIP models shows the superiority of xpMODEL.展开更多
文摘In order to gain the great performance of ASIP,this paper discusses different aspects of an ASIP instruction set specification like syntax,encoding,constraints as well as behaviors,and introduces our ADL model based methodology to check them.The automatic generation of test cases based on our straight-forward instruction representation is shown,and the efficient generation of them with good coverage is shown as well.The verification of the constraint checker,a very important tool for programmer,is performed.Results show that the toolkit can find some errors in previous delivery tools,and the introduced methodology verifies the feasibility of our instruction set specification.
基金Supported by the Industrial Internet Innovation and Development Project of Ministry of Industry and Information Technology (No.GHBJ2004)。
文摘A Taylor series expansion(TSE) based design for minimum mean-square error(MMSE) and QR decomposition(QRD) of multi-input and multi-output(MIMO) systems is proposed based on application specific instruction set processor(ASIP), which uses TSE algorithm instead of resource-consuming reciprocal and reciprocal square root(RSR) operations.The aim is to give a high performance implementation for MMSE and QRD in one programmable platform simultaneously.Furthermore, instruction set architecture(ISA) and the allocation of data paths in single instruction multiple data-very long instruction word(SIMD-VLIW) architecture are provided, offering more data parallelism and instruction parallelism for different dimension matrices and operation types.Meanwhile, multiple level numerical precision can be achieved with flexible table size and expansion order in TSE ISA.The ASIP has been implemented to a 28 nm CMOS process and frequency reaches 800 MHz.Experimental results show that the proposed design provides perfect numerical precision within the fixed bit-width of the ASIP, higher matrix processing rate better than the requirements of 5G system and more rate-area efficiency comparable with ASIC implementations.
文摘嵌入式系统的应用多样性和设计时效性特征对专用指令集处理器(application specific instructure setprocessor,ASIP)体系结构设计提出了挑战。提出一种ASIP设计平台A2IDE,它将ASIP的系统级设计任务划分为指令集、流水线和微结构三个层次,并采用体系结构描述语言驱动软件工具集自动生成和各层次上的设计空间搜索。对A2IDE的特点、架构进行了描述,并通过实验初步证明了A2IDE平台的有效性。
基金supported by National Natural Science Foundation of China with granted No.61404175
文摘An Efficient and flexible implementation of block ciphers is critical to achieve information security processing.Existing implementation methods such as GPP,FPGA and cryptographic application-specific ASIC provide the broad range of support.However,these methods could not achieve a good tradeoff between high-speed processing and flexibility.In this paper,we present a reconfigurable VLIW processor architecture targeted at block cipher processing,analyze basic operations and storage characteristics,and propose the multi-cluster register-file structure for block ciphers.As for the same operation element of block ciphers,we adopt reconfigurable technology for multiple cryptographic processing units and interconnection scheme.The proposed processor not only flexibly accomplishes the combination of multiple basic cryptographic operations,but also realizes dynamic configuration for cryptographic processing units.It has been implemented with0.18μm CMOS technology,the test results show that the frequency can reach 350 MHz.and power consumption is 420 mw.Ten kinds of block and hash ciphers were realized in the processor.The encryption throughput of AES,DES,IDEA,and SHA-1 algorithm is1554 Mbps,448Mbps,785 Mbps,and 424 Mbps respectively,the test result shows that our processor's encryption performance is significantly higher than other designs.
基金supported by National Natural Science Foundation of China with granted No.61404175
文摘As an important branch of information security algorithms,the efficient and flexible implementation of stream ciphers is vital.Existing implementation methods,such as FPGA,GPP and ASIC,provide a good support,but they could not achieve a better tradeoff between high speed processing and high flexibility.ASIC has fast processing speed,but its flexibility is poor,GPP has high flexibility,but the processing speed is slow,FPGA has high flexibility and processing speed,but the resource utilization is very low.This paper studies a stream cryptographic processor which can efficiently and flexibly implement a variety of stream cipher algorithms.By analyzing the structure model,processing characteristics and storage characteristics of stream ciphers,a reconfigurable stream cryptographic processor with special instructions based on VLIW is presented,which has separate/cluster storage structure and is oriented to stream cipher operations.The proposed instruction structure can effectively support stream cipher processing with multiple data bit widths,parallelism among stream cipher processing with different data bit widths,and parallelism among branch control and stream cipher processing with high instruction level parallelism;the designed separate/clustered special bit registers and general register heaps,key register heaps can satisfy cryptographic requirements.So the proposed processor not only flexibly accomplishes the combination of multiple basic stream cipher operations to finish stream cipher algorithms.It has been implemented with 0.18μm CMOS technology,the test results show that the frequency can reach 200 MHz,and power consumption is 310 mw.Ten kinds of stream ciphers were realized in the processor.The key stream generation throughput of Grain-80,W7,MICKEY,ACHTERBAHN and Shrink algorithm is 100 Mbps,66.67 Mbps,66.67 Mbps,50 Mbps and 800 Mbps,respectively.The test result shows that the processor presented can achieve good tradeoff between high performance and flexibility of stream ciphers.
基金supported by the national high technology research and development 863 program of China.(2008AA01Z103)
文摘The requirement of the flexible and effective implementation of the Elliptic Curve Cryptography (ECC) has become more and more exigent since its dominant position in the public-key cryptography application.Based on analyzing the basic structure features of Elliptic Curve Cryptography (ECC) algorithms,the parallel schedule algorithm of point addition and doubling is presented.And based on parallel schedule algorithm,the Application Specific Instruction-Set Co-Processor of ECC that adopting VLIW architecture is also proposed in this paper.The coprocessor for ECC is implemented and validated using Altera’s FPGA.The experimental result shows that our proposed coprocessor has advantage in high performance and flexibility.
基金National Natural Science Foundation of China(No. 60273042)
文摘With greater flexibility and less cost, there is a trend that application specific instruction set processor(ASIP) will become the alternative implementation style to application of specific integrated circuit(ASIC). Architecture model is a key component in ASIP design flow. A novel ASIP model, xpMODEL, was presented. Its key features include: explicit specification of the memory subsystem allowing novel memory organizations and hierarchies; the introduction of meta-operator and instruction behavior extended finite state machine providing xpMODEL with ability to model execution sequencing, inherent parallelism, data/control/structural hazards, and out-of-order execution mode in ASIP. A comparison with other ASIP models shows the superiority of xpMODEL.