We have proposed a flexible coprocessor key-authentication architecture for 80/112-bit security-related applications over GF(2m)field by employing Elliptic-curve Diffie Hellman(ECDH)protocol.Towards flexibility,a seri...We have proposed a flexible coprocessor key-authentication architecture for 80/112-bit security-related applications over GF(2m)field by employing Elliptic-curve Diffie Hellman(ECDH)protocol.Towards flexibility,a serial input/output interface is used to load/produce secret,public,and shared keys sequentially.Moreover,to reduce the hardware resources and to achieve a reasonable time for cryptographic computations,we have proposed a finite field digit-serial multiplier architecture using combined shift and accumulate techniques.Furthermore,two finite-statemachine controllers are used to perform efficient control functionalities.The proposed coprocessor architecture over GF(2^(163))and GF(2^(233))is programmed using Verilog and then implemented on Xilinx Virtex-7 FPGA(field-programmable-gate-array)device.For GF(2^(163))and GF(2^(233)),the proposed flexible coprocessor use 1351 and 1789 slices,the achieved clock frequency is 250 and 235MHz,time for one public key computation is 40.50 and 79.20μs and time for one shared key generation is 81.00 and 158.40μs.Similarly,the consumed power over GF(2^(163))and GF(2^(233))is 0.91 and 1.37mW,respectively.The proposed coprocessor architecture outperforms state-of-the-art ECDH designs in terms of hardware resources.展开更多
A GF(p) elliptic curve cryptographic coprocessor is proposed and implemented on Field Programmable Gate Array (FPGA). The focus of the coprocessor is on the most critical, complicated and time-consuming point multipli...A GF(p) elliptic curve cryptographic coprocessor is proposed and implemented on Field Programmable Gate Array (FPGA). The focus of the coprocessor is on the most critical, complicated and time-consuming point multiplications. The technique of coordinates conversion and fast multiplication algorithm of two large integers are utilized to avoid frequent inversions and to accelerate the field multiplications used in point multiplications. The characteristic of hardware parallelism is considered in the implementation of point multiplications. The coprocessor implemented on XILINX XC2V3000 computes a point multiplication for an arbitrary point on a curve defined over GF(2192?264?1) with the frequency of 10 MHz in 4.40 ms in the average case and 5.74 ms in the worst case. At the same circumstance, the coprocessor implemented on XILINX XC2V4000 takes 2.2 ms in the average case and 2.88 ms in the worst case.展开更多
This paper describes a microprogrammed architecture for an embedded coprocessor that is able to control IEEE 1149.1 to IEEE 1149.7 test infrastructures, and explains how to expand the supported test command set. The c...This paper describes a microprogrammed architecture for an embedded coprocessor that is able to control IEEE 1149.1 to IEEE 1149.7 test infrastructures, and explains how to expand the supported test command set. The coprocessor uses a fast simplex link (FSL) channel to interface a 32-bit MicroBlaze CPU, but it can work with any microprocessor core that accepts this simple FIFO-based interface method. The implementation cost (logic resource usage for a Xilinx Spartan-6 FPGA) and the performance data (operating frequency) are presented for a test command set comprising two parts: 1) the full IEEE 1149.1 structural test operations;2) a subset of IEEE 1149.7 operations selected to illustrate the implementation of advanced scan formats.展开更多
The paper describes an efficient direct method to solve an equation Ax = b, where A is a sparse matrix, on the Intel®Xeon PhiTM coprocessor. The main challenge for such a system is how to engage all available ...The paper describes an efficient direct method to solve an equation Ax = b, where A is a sparse matrix, on the Intel®Xeon PhiTM coprocessor. The main challenge for such a system is how to engage all available threads (about 240) and how to reduce OpenMP* synchronization overhead, which is very expensive for hundreds of threads. The method consists of decomposing A into a product of lower-triangular, diagonal, and upper triangular matrices followed by solves of the resulting three subsystems. The main idea is based on the hybrid parallel algorithm used in the Intel®Math Kernel Library Parallel Direct Sparse Solver for Clusters [1]. Our implementation exploits a static scheduling algorithm during the factorization step to reduce OpenMP synchronization overhead. To effectively engage all available threads, a three-level approach of parallelization is used. Furthermore, we demonstrate that our implementation can perform up to 100 times better on factorization step and up to 65 times better in terms of overall performance on the 240 threads of the Intel®Xeon PhiTM coprocessor.展开更多
基金This project has received funding by the NSTIP Strategic Technologies program under Grant Number 14-415 ELE1448-10,King Abdul Aziz City of Science and Technology of the Kingdom of Saudi Arabia.
文摘We have proposed a flexible coprocessor key-authentication architecture for 80/112-bit security-related applications over GF(2m)field by employing Elliptic-curve Diffie Hellman(ECDH)protocol.Towards flexibility,a serial input/output interface is used to load/produce secret,public,and shared keys sequentially.Moreover,to reduce the hardware resources and to achieve a reasonable time for cryptographic computations,we have proposed a finite field digit-serial multiplier architecture using combined shift and accumulate techniques.Furthermore,two finite-statemachine controllers are used to perform efficient control functionalities.The proposed coprocessor architecture over GF(2^(163))and GF(2^(233))is programmed using Verilog and then implemented on Xilinx Virtex-7 FPGA(field-programmable-gate-array)device.For GF(2^(163))and GF(2^(233)),the proposed flexible coprocessor use 1351 and 1789 slices,the achieved clock frequency is 250 and 235MHz,time for one public key computation is 40.50 and 79.20μs and time for one shared key generation is 81.00 and 158.40μs.Similarly,the consumed power over GF(2^(163))and GF(2^(233))is 0.91 and 1.37mW,respectively.The proposed coprocessor architecture outperforms state-of-the-art ECDH designs in terms of hardware resources.
基金Supported by the National Natural Science Foun dation of China ( 69973034 ) and the National High TechnologyResearch and Development Program of China (2002AA141050)
文摘A GF(p) elliptic curve cryptographic coprocessor is proposed and implemented on Field Programmable Gate Array (FPGA). The focus of the coprocessor is on the most critical, complicated and time-consuming point multiplications. The technique of coordinates conversion and fast multiplication algorithm of two large integers are utilized to avoid frequent inversions and to accelerate the field multiplications used in point multiplications. The characteristic of hardware parallelism is considered in the implementation of point multiplications. The coprocessor implemented on XILINX XC2V3000 computes a point multiplication for an arbitrary point on a curve defined over GF(2192?264?1) with the frequency of 10 MHz in 4.40 ms in the average case and 5.74 ms in the worst case. At the same circumstance, the coprocessor implemented on XILINX XC2V4000 takes 2.2 ms in the average case and 2.88 ms in the worst case.
文摘This paper describes a microprogrammed architecture for an embedded coprocessor that is able to control IEEE 1149.1 to IEEE 1149.7 test infrastructures, and explains how to expand the supported test command set. The coprocessor uses a fast simplex link (FSL) channel to interface a 32-bit MicroBlaze CPU, but it can work with any microprocessor core that accepts this simple FIFO-based interface method. The implementation cost (logic resource usage for a Xilinx Spartan-6 FPGA) and the performance data (operating frequency) are presented for a test command set comprising two parts: 1) the full IEEE 1149.1 structural test operations;2) a subset of IEEE 1149.7 operations selected to illustrate the implementation of advanced scan formats.
文摘The paper describes an efficient direct method to solve an equation Ax = b, where A is a sparse matrix, on the Intel®Xeon PhiTM coprocessor. The main challenge for such a system is how to engage all available threads (about 240) and how to reduce OpenMP* synchronization overhead, which is very expensive for hundreds of threads. The method consists of decomposing A into a product of lower-triangular, diagonal, and upper triangular matrices followed by solves of the resulting three subsystems. The main idea is based on the hybrid parallel algorithm used in the Intel®Math Kernel Library Parallel Direct Sparse Solver for Clusters [1]. Our implementation exploits a static scheduling algorithm during the factorization step to reduce OpenMP synchronization overhead. To effectively engage all available threads, a three-level approach of parallelization is used. Furthermore, we demonstrate that our implementation can perform up to 100 times better on factorization step and up to 65 times better in terms of overall performance on the 240 threads of the Intel®Xeon PhiTM coprocessor.