The SubBytes (S-box) transformation is the most crucial operation in the AES algorithm, significantly impacting the implementation performance of AES chips. To design a high-performance S-box, a segmented optimization...The SubBytes (S-box) transformation is the most crucial operation in the AES algorithm, significantly impacting the implementation performance of AES chips. To design a high-performance S-box, a segmented optimization implementation of the S-box is proposed based on the composite field inverse operation in this paper. This proposed S-box implementation is modeled using Verilog language and synthesized using Design Complier software under the premise of ensuring the correctness of the simulation result. The synthesis results show that, compared to several current S-box implementation schemes, the proposed implementation of the S-box significantly reduces the area overhead and critical path delay, then gets higher hardware efficiency. This provides strong support for realizing efficient and compact S-box ASIC designs.展开更多
The single reference second order Brillouin-Wigner perturbation theory recently developed, which eliminates its size-extensivity error, has been generalized to state-specific, multi-reference (SS-MR), BWPT2 providin...The single reference second order Brillouin-Wigner perturbation theory recently developed, which eliminates its size-extensivity error, has been generalized to state-specific, multi-reference (SS-MR), BWPT2 providing a size-extensive correction to the electron correlation problem for systems that demand the use of a multi-reference function. Illustrative numerical tests of the size-extensivity corrections are made for widely used molecules in their ground states, which are pronounced multi-reference characteristics. We have implemented two-reference and three-reference cases for CH2, BH and bond breaking process in the ground states of HF molecules. The results are compared with the rigorously size-extensive methods such as the M^ller-Plesset perturbation theory, i.e., MP2, full configuration interaction (Full-CI) and allied methods using the same basis sets.展开更多
This paper will provide some insights on the application of Field Programmable Gate Array (FPGA) in process tomography. The focus of this paper will be to investigate the performance of the technology with respect to ...This paper will provide some insights on the application of Field Programmable Gate Array (FPGA) in process tomography. The focus of this paper will be to investigate the performance of the technology with respect to various tomography systems and comparison to other similar technologies including the Application Specific Integrated Circuit (ASIC), Graphics Processing Unit (GPU) and the microcontroller. Fundamentally, the FPGA is primarily used in the Data Acquisition System (DAQ) due to its better performance and better trade-off as compared to competitor technologies. However, the drawback of using FPGA is that it is relatively more expensive.展开更多
Surface distribution and seasonal variation of alkalinity and specific alkalinity in Kuroshio area of the East ChinaSea and their application to the water mass tracing are discussed in this paper. Results show a disti...Surface distribution and seasonal variation of alkalinity and specific alkalinity in Kuroshio area of the East ChinaSea and their application to the water mass tracing are discussed in this paper. Results show a distinct seasonal variation of the alkalinity, which is concerned with the process of vertical mixing. Different specific alkalinity in various water masses has been found. On the basis of the difference of the specific alkalinity and the distribution of alkalinity, two water fronts in summer season, located at 27°-30°N and 124°-1 27°E, (Ⅰ), and at the northern waters about one latitude from the Taiwan Island, (Ⅱ); one in winter season at about one longitude from coast of mainland of China and 26°-30°N were found. In summer season, about 1-2 longitudes eastward shift of front (Ⅰ) is found by comparison of data in May and August. And the high alkalinity of the northern East China Sea in summer season may be caused by the Huanghe River runoff flowing southward along with the Huanghai Sea Coastal Current.展开更多
A Taylor series expansion(TSE) based design for minimum mean-square error(MMSE) and QR decomposition(QRD) of multi-input and multi-output(MIMO) systems is proposed based on application specific instruction set process...A Taylor series expansion(TSE) based design for minimum mean-square error(MMSE) and QR decomposition(QRD) of multi-input and multi-output(MIMO) systems is proposed based on application specific instruction set processor(ASIP), which uses TSE algorithm instead of resource-consuming reciprocal and reciprocal square root(RSR) operations.The aim is to give a high performance implementation for MMSE and QRD in one programmable platform simultaneously.Furthermore, instruction set architecture(ISA) and the allocation of data paths in single instruction multiple data-very long instruction word(SIMD-VLIW) architecture are provided, offering more data parallelism and instruction parallelism for different dimension matrices and operation types.Meanwhile, multiple level numerical precision can be achieved with flexible table size and expansion order in TSE ISA.The ASIP has been implemented to a 28 nm CMOS process and frequency reaches 800 MHz.Experimental results show that the proposed design provides perfect numerical precision within the fixed bit-width of the ASIP, higher matrix processing rate better than the requirements of 5G system and more rate-area efficiency comparable with ASIC implementations.展开更多
A novel frequency hopping(FH) sequences generator based on advanced encryption standard(AES) iterated block cipher is proposed for FH communication systems.The analysis shows that the FH sequences based on AES algorit...A novel frequency hopping(FH) sequences generator based on advanced encryption standard(AES) iterated block cipher is proposed for FH communication systems.The analysis shows that the FH sequences based on AES algorithm have good performance in uniformity, correlation, complexity and security.A high-speed, low-power and low-cost ASIC of FH sequences generator is implemented by optimizing the structure of S-Box and MixColumns of AES algorithm, proposing a hierarchical power management strategy, and applying ...展开更多
An application specific integrated circuit (ASIC) design of a 1024 points floating-point fast Fourier transform(FFT) processor is presented. It can satisfy the requirement of high accuracy FFT result in related fields...An application specific integrated circuit (ASIC) design of a 1024 points floating-point fast Fourier transform(FFT) processor is presented. It can satisfy the requirement of high accuracy FFT result in related fields. Several novel design techniques for floating-point adder and multiplier are introduced in detail to enhance the speed of the system. At the same time, the power consumption is decreased. The hardware area is effectively reduced as an improved butterfly processor is developed. There is a substantial increase in the performance of the design since a pipelined architecture is adopted, and very large scale integrated (VLSI) is easy to realize due to the regularity. A result of validation using field programmable gate array (FPGA) is shown at the end. When the system clock is set to 50 MHz, 204.8 μs is needed to complete the operation of FFT computation.展开更多
The requirement of the flexible and effective implementation of the Elliptic Curve Cryptography (ECC) has become more and more exigent since its dominant position in the public-key cryptography application.Based on an...The requirement of the flexible and effective implementation of the Elliptic Curve Cryptography (ECC) has become more and more exigent since its dominant position in the public-key cryptography application.Based on analyzing the basic structure features of Elliptic Curve Cryptography (ECC) algorithms,the parallel schedule algorithm of point addition and doubling is presented.And based on parallel schedule algorithm,the Application Specific Instruction-Set Co-Processor of ECC that adopting VLIW architecture is also proposed in this paper.The coprocessor for ECC is implemented and validated using Altera’s FPGA.The experimental result shows that our proposed coprocessor has advantage in high performance and flexibility.展开更多
To find a design method for 3D active multichannel silicon microelectrode, a microstructure of active neural recording system is presented, where two 2D probes, two integrated circuits and two spacers are microassembl...To find a design method for 3D active multichannel silicon microelectrode, a microstructure of active neural recording system is presented, where two 2D probes, two integrated circuits and two spacers are microassembled on a 5 mm×7 mm silicon platform, and 32 sites neural signals can be operated simultaneously. A theoretical model for measuring the neural signal by the silicon microelectrode is proposed based on the structure and fabrication process of a single-shank probe. The method of determining the dimensional parameters of the probe shank is discussed in the following three aspects, i.e. the structures of pallium and endocranium, coupled interconnecters noise, and strength characteristic of neural probe. The design criterion is to minimize the size of the neural probe as well as that the probe has enough stiffness to pierce the endocranium. The on-chip unity-gain bandpass amplifier has an overall gain of 42 dB over a bandwidth from 60 Hz to 10 kHz; and the DC-baseline stability circuit is of high input resistance above 30 MΩ to guarantee a cutoff frequency below 100 Hz. The circuit works in stimulating or recording modes. The conversion of the modes depends on the stimulating control signal.展开更多
As the traditional RISC+ASIC/ASSP approach for network processor design can not meet the today’s requirements, this paper described an alternate approach, Reconfigurable Processing Architecture, to boost the performa...As the traditional RISC+ASIC/ASSP approach for network processor design can not meet the today’s requirements, this paper described an alternate approach, Reconfigurable Processing Architecture, to boost the performance to ASIC level while reserve the programmability of the traditional RISC based system. This paper covers both the hardware architecture and the software development environment architecture.展开更多
The security of cryptographic algorithms based on integer factorization and discrete logarithm will be threatened by quantum computers in future.Since December 2016,the National Institute of Standards and Technology(N...The security of cryptographic algorithms based on integer factorization and discrete logarithm will be threatened by quantum computers in future.Since December 2016,the National Institute of Standards and Technology(NIST)has begun to solicit post-quantum cryptographic(PQC)algorithms worldwide.CRYSTALS-Kyber was selected as the standard of PQC algorithm after 3 rounds of evaluation.Meanwhile considering the large resource consumption of current implementation,this paper presents a lightweight architecture for ASICs and its implementation on FPGAs for prototyping.In this implementation,a novel compact modular multiplication unit(MMU)and compression/decompression module is proposed to save hardware resources.We put forward a specially optimized schoolbook polynomial multiplication(SPM)instead of number theoretic transform(NTT)core for polynomial multiplication,which can reduce about 74%SLICE cost.We also use signed number representation to save memory resources.In addition,we optimize the hardware implementation of the Hash module,which cuts off about 48%of FF consumption by register reuse technology.Our design can be implemented on Kintex-7(XC7K325T-2FFG900I)FPGA for prototyping,which occupations of 4777/4993 LUTs,2661/2765 FFs,1395/1452 SLICEs,2.5/2.5 BRAMs,and 0/0 DSP respective of client/server side.The maximum clock frequency can reach at 244 MHz.As far as we know,our design consumes the least resources compared with other existing designs,which is very friendly to resource-constrained devices.展开更多
The rapid development of multimedia techniques has increased the demands on multimedia processors. This paper presents a new design method to quickly design high performance processors for new multimedia applications....The rapid development of multimedia techniques has increased the demands on multimedia processors. This paper presents a new design method to quickly design high performance processors for new multimedia applications. In this approach, a configurable processor based on the very long instruction-set word architecture is used as the basic core for designers to easily configure new processor cores for multimedia algorithm. Specific instructions designed for multimedia applications efficiently improve the performance of the target processor. Functions not implemented in the digital signal processor (DSP) core can be easily integrated into the target processor as user-defined hardware to increase the performance. Several examples are given based on the architecture. The results show that the processor performance is enhanced approximately 4 times on the H.263 codec and that the processor outperforms both DSPs and single instruction multiple data (SIMD) multimedia extension architectures by up to 8 times when computing the 2-D-IDCT.展开更多
A low-power and low-cost advanced encryption standard (AES) coprocessor is proposed for Zigbee system-on-a-chip (SoC) design. The cost and power consumption of the proposed AES coprocessor are reduced considerably...A low-power and low-cost advanced encryption standard (AES) coprocessor is proposed for Zigbee system-on-a-chip (SoC) design. The cost and power consumption of the proposed AES coprocessor are reduced considerably by optimizing the architectures of SubBytes/InvSubBytes and MixColumns/InvMixColumns, integrating the encryption and decryption procedures together by the method of resource sharing, and using the hierarchical power management strategy based on finite state machine (FSM) and clock gating (CG) technologies. Based on SMIC 0.18 μm complementary metal oxide semiconductor (CMOS) technology, the scale of the AES coprocessor is only about 10.5 kgate, the corresponding power consumption is 69.1 μW/MHz, and the throughput is 32 Mb/s, which is reasonable and sufficient for Zigbee system. Compared with other designs, the proposed architecture consumes less power and fewer hardware resources, which is conducive to the Zigbee system and other portable devices.展开更多
As part of a recent analysis of exclusive two-photon production of W+W- pairs at the LHC, the CMS experiment used di-lepton data to obtain an "effective" photon-photon luminosity. We show how the CMS analysis on th...As part of a recent analysis of exclusive two-photon production of W+W- pairs at the LHC, the CMS experiment used di-lepton data to obtain an "effective" photon-photon luminosity. We show how the CMS analysis on their 8 TeV data, along with some assumptions about the likelihood for events in which the proton breaks up to pass the selection criteria, can be used to significantly constrain the photon parton distribution functions, such as those from the CTEQ, MRST, and NNPDF collaborations. We compare the data with predictions using these photon distributions, as well as the new LUXqed photon distribution. We study the impact of including these data on the NNPDF2.3QED, NNPDF3.0QED and CT14QEDinc fits. We find that these data place a useful and complementary cross-check on the photon distribution, which is consistent with the LUXqed prediction while suggesting that the NNPDF photon error band should be significantly reduced. Additionally, we propose a simple model for describing the two-photon production of W^+W^- at the LHC. Using this model, we constrain the number of inelastic photons that remain after the experimental cuts are applied.展开更多
文摘The SubBytes (S-box) transformation is the most crucial operation in the AES algorithm, significantly impacting the implementation performance of AES chips. To design a high-performance S-box, a segmented optimization implementation of the S-box is proposed based on the composite field inverse operation in this paper. This proposed S-box implementation is modeled using Verilog language and synthesized using Design Complier software under the premise of ensuring the correctness of the simulation result. The synthesis results show that, compared to several current S-box implementation schemes, the proposed implementation of the S-box significantly reduces the area overhead and critical path delay, then gets higher hardware efficiency. This provides strong support for realizing efficient and compact S-box ASIC designs.
基金Supported by the Scientific and Technological Research Council of Turkey(TUBITAK)under Grant No 2219-1/2013
文摘The single reference second order Brillouin-Wigner perturbation theory recently developed, which eliminates its size-extensivity error, has been generalized to state-specific, multi-reference (SS-MR), BWPT2 providing a size-extensive correction to the electron correlation problem for systems that demand the use of a multi-reference function. Illustrative numerical tests of the size-extensivity corrections are made for widely used molecules in their ground states, which are pronounced multi-reference characteristics. We have implemented two-reference and three-reference cases for CH2, BH and bond breaking process in the ground states of HF molecules. The results are compared with the rigorously size-extensive methods such as the M^ller-Plesset perturbation theory, i.e., MP2, full configuration interaction (Full-CI) and allied methods using the same basis sets.
文摘This paper will provide some insights on the application of Field Programmable Gate Array (FPGA) in process tomography. The focus of this paper will be to investigate the performance of the technology with respect to various tomography systems and comparison to other similar technologies including the Application Specific Integrated Circuit (ASIC), Graphics Processing Unit (GPU) and the microcontroller. Fundamentally, the FPGA is primarily used in the Data Acquisition System (DAQ) due to its better performance and better trade-off as compared to competitor technologies. However, the drawback of using FPGA is that it is relatively more expensive.
文摘Surface distribution and seasonal variation of alkalinity and specific alkalinity in Kuroshio area of the East ChinaSea and their application to the water mass tracing are discussed in this paper. Results show a distinct seasonal variation of the alkalinity, which is concerned with the process of vertical mixing. Different specific alkalinity in various water masses has been found. On the basis of the difference of the specific alkalinity and the distribution of alkalinity, two water fronts in summer season, located at 27°-30°N and 124°-1 27°E, (Ⅰ), and at the northern waters about one latitude from the Taiwan Island, (Ⅱ); one in winter season at about one longitude from coast of mainland of China and 26°-30°N were found. In summer season, about 1-2 longitudes eastward shift of front (Ⅰ) is found by comparison of data in May and August. And the high alkalinity of the northern East China Sea in summer season may be caused by the Huanghe River runoff flowing southward along with the Huanghai Sea Coastal Current.
基金Supported by the Industrial Internet Innovation and Development Project of Ministry of Industry and Information Technology (No.GHBJ2004)。
文摘A Taylor series expansion(TSE) based design for minimum mean-square error(MMSE) and QR decomposition(QRD) of multi-input and multi-output(MIMO) systems is proposed based on application specific instruction set processor(ASIP), which uses TSE algorithm instead of resource-consuming reciprocal and reciprocal square root(RSR) operations.The aim is to give a high performance implementation for MMSE and QRD in one programmable platform simultaneously.Furthermore, instruction set architecture(ISA) and the allocation of data paths in single instruction multiple data-very long instruction word(SIMD-VLIW) architecture are provided, offering more data parallelism and instruction parallelism for different dimension matrices and operation types.Meanwhile, multiple level numerical precision can be achieved with flexible table size and expansion order in TSE ISA.The ASIP has been implemented to a 28 nm CMOS process and frequency reaches 800 MHz.Experimental results show that the proposed design provides perfect numerical precision within the fixed bit-width of the ASIP, higher matrix processing rate better than the requirements of 5G system and more rate-area efficiency comparable with ASIC implementations.
基金Supported by National Natural Science Foundation of China (No.60676053)
文摘A novel frequency hopping(FH) sequences generator based on advanced encryption standard(AES) iterated block cipher is proposed for FH communication systems.The analysis shows that the FH sequences based on AES algorithm have good performance in uniformity, correlation, complexity and security.A high-speed, low-power and low-cost ASIC of FH sequences generator is implemented by optimizing the structure of S-Box and MixColumns of AES algorithm, proposing a hierarchical power management strategy, and applying ...
文摘An application specific integrated circuit (ASIC) design of a 1024 points floating-point fast Fourier transform(FFT) processor is presented. It can satisfy the requirement of high accuracy FFT result in related fields. Several novel design techniques for floating-point adder and multiplier are introduced in detail to enhance the speed of the system. At the same time, the power consumption is decreased. The hardware area is effectively reduced as an improved butterfly processor is developed. There is a substantial increase in the performance of the design since a pipelined architecture is adopted, and very large scale integrated (VLSI) is easy to realize due to the regularity. A result of validation using field programmable gate array (FPGA) is shown at the end. When the system clock is set to 50 MHz, 204.8 μs is needed to complete the operation of FFT computation.
基金supported by the national high technology research and development 863 program of China.(2008AA01Z103)
文摘The requirement of the flexible and effective implementation of the Elliptic Curve Cryptography (ECC) has become more and more exigent since its dominant position in the public-key cryptography application.Based on analyzing the basic structure features of Elliptic Curve Cryptography (ECC) algorithms,the parallel schedule algorithm of point addition and doubling is presented.And based on parallel schedule algorithm,the Application Specific Instruction-Set Co-Processor of ECC that adopting VLIW architecture is also proposed in this paper.The coprocessor for ECC is implemented and validated using Altera’s FPGA.The experimental result shows that our proposed coprocessor has advantage in high performance and flexibility.
基金Supported by Tianjin Municipal Science and Technology Commission(No. 05YFSYSF01700).
文摘To find a design method for 3D active multichannel silicon microelectrode, a microstructure of active neural recording system is presented, where two 2D probes, two integrated circuits and two spacers are microassembled on a 5 mm×7 mm silicon platform, and 32 sites neural signals can be operated simultaneously. A theoretical model for measuring the neural signal by the silicon microelectrode is proposed based on the structure and fabrication process of a single-shank probe. The method of determining the dimensional parameters of the probe shank is discussed in the following three aspects, i.e. the structures of pallium and endocranium, coupled interconnecters noise, and strength characteristic of neural probe. The design criterion is to minimize the size of the neural probe as well as that the probe has enough stiffness to pierce the endocranium. The on-chip unity-gain bandpass amplifier has an overall gain of 42 dB over a bandwidth from 60 Hz to 10 kHz; and the DC-baseline stability circuit is of high input resistance above 30 MΩ to guarantee a cutoff frequency below 100 Hz. The circuit works in stimulating or recording modes. The conversion of the modes depends on the stimulating control signal.
文摘As the traditional RISC+ASIC/ASSP approach for network processor design can not meet the today’s requirements, this paper described an alternate approach, Reconfigurable Processing Architecture, to boost the performance to ASIC level while reserve the programmability of the traditional RISC based system. This paper covers both the hardware architecture and the software development environment architecture.
基金supported in part by the Shaanxi Province Key R&D Program(2019ZDLGY12-09)in part by the Higher Education Discipline Innovation 111 project(B16037)+1 种基金in part by the Shaanxi innovation team project(2018TD-007)in part by the China National Natural Science Foundation(62102298).
文摘The security of cryptographic algorithms based on integer factorization and discrete logarithm will be threatened by quantum computers in future.Since December 2016,the National Institute of Standards and Technology(NIST)has begun to solicit post-quantum cryptographic(PQC)algorithms worldwide.CRYSTALS-Kyber was selected as the standard of PQC algorithm after 3 rounds of evaluation.Meanwhile considering the large resource consumption of current implementation,this paper presents a lightweight architecture for ASICs and its implementation on FPGAs for prototyping.In this implementation,a novel compact modular multiplication unit(MMU)and compression/decompression module is proposed to save hardware resources.We put forward a specially optimized schoolbook polynomial multiplication(SPM)instead of number theoretic transform(NTT)core for polynomial multiplication,which can reduce about 74%SLICE cost.We also use signed number representation to save memory resources.In addition,we optimize the hardware implementation of the Hash module,which cuts off about 48%of FF consumption by register reuse technology.Our design can be implemented on Kintex-7(XC7K325T-2FFG900I)FPGA for prototyping,which occupations of 4777/4993 LUTs,2661/2765 FFs,1395/1452 SLICEs,2.5/2.5 BRAMs,and 0/0 DSP respective of client/server side.The maximum clock frequency can reach at 244 MHz.As far as we know,our design consumes the least resources compared with other existing designs,which is very friendly to resource-constrained devices.
基金Supported by the National Natural Science Foundation of China (No. 60236020)the Specialized Research Fund for the Doctoral Program of Higher Education (No. 20050003083)
文摘The rapid development of multimedia techniques has increased the demands on multimedia processors. This paper presents a new design method to quickly design high performance processors for new multimedia applications. In this approach, a configurable processor based on the very long instruction-set word architecture is used as the basic core for designers to easily configure new processor cores for multimedia algorithm. Specific instructions designed for multimedia applications efficiently improve the performance of the target processor. Functions not implemented in the digital signal processor (DSP) core can be easily integrated into the target processor as user-defined hardware to increase the performance. Several examples are given based on the architecture. The results show that the processor performance is enhanced approximately 4 times on the H.263 codec and that the processor outperforms both DSPs and single instruction multiple data (SIMD) multimedia extension architectures by up to 8 times when computing the 2-D-IDCT.
基金supported by the National Natural Science Foundation of China(60676053)
文摘A low-power and low-cost advanced encryption standard (AES) coprocessor is proposed for Zigbee system-on-a-chip (SoC) design. The cost and power consumption of the proposed AES coprocessor are reduced considerably by optimizing the architectures of SubBytes/InvSubBytes and MixColumns/InvMixColumns, integrating the encryption and decryption procedures together by the method of resource sharing, and using the hierarchical power management strategy based on finite state machine (FSM) and clock gating (CG) technologies. Based on SMIC 0.18 μm complementary metal oxide semiconductor (CMOS) technology, the scale of the AES coprocessor is only about 10.5 kgate, the corresponding power consumption is 69.1 μW/MHz, and the throughput is 32 Mb/s, which is reasonable and sufficient for Zigbee system. Compared with other designs, the proposed architecture consumes less power and fewer hardware resources, which is conducive to the Zigbee system and other portable devices.
基金Supported by the U.S.National Science Foundation(PHY-1417326,PHY-1719914)the National Natural Science Foundation of China(11465018)
文摘As part of a recent analysis of exclusive two-photon production of W+W- pairs at the LHC, the CMS experiment used di-lepton data to obtain an "effective" photon-photon luminosity. We show how the CMS analysis on their 8 TeV data, along with some assumptions about the likelihood for events in which the proton breaks up to pass the selection criteria, can be used to significantly constrain the photon parton distribution functions, such as those from the CTEQ, MRST, and NNPDF collaborations. We compare the data with predictions using these photon distributions, as well as the new LUXqed photon distribution. We study the impact of including these data on the NNPDF2.3QED, NNPDF3.0QED and CT14QEDinc fits. We find that these data place a useful and complementary cross-check on the photon distribution, which is consistent with the LUXqed prediction while suggesting that the NNPDF photon error band should be significantly reduced. Additionally, we propose a simple model for describing the two-photon production of W^+W^- at the LHC. Using this model, we constrain the number of inelastic photons that remain after the experimental cuts are applied.