Network processors are used in the core node of network to flexibly process packet streams. With the increase of performance, the power of network processor increases fast, and power and cooling become a bottleneck. A...Network processors are used in the core node of network to flexibly process packet streams. With the increase of performance, the power of network processor increases fast, and power and cooling become a bottleneck. Architecture-level power conscious design must go beyond low-level circuit design. Architectural power and performance tradeoff should be considered at the same time. Simulation is an efficient method to design modem network processor before making chip. In order to achieve the tradeoff between performance and power, the processor simulator is used to design the architecture of network processor. Using Netbeneh, Commubench benchmark and processor simulator-SimpleScalar, the performance and power of network processor are quantitatively evaluated. New performance tradeoff evaluation metric is proposed to analyze the architecture of network processor. Based on the high performance lnteI IXP 2800 Network processor eonfignration, optimized instruction fetch width and speed ,instruction issue width, instruction window size are analyzed and selected. Simulation resuits show that the tradeoff design method makes the usage of network processor more effectively. The optimal key parameters of network processor are important in architecture-level design. It is meaningful for the next generation network processor design.展开更多
Matrix multiplication plays a pivotal role in the symmetric cipher algorithms, but it is one of the most complex and time consuming units, its performance directly affects the efficiency of cipher algorithms. Combined...Matrix multiplication plays a pivotal role in the symmetric cipher algorithms, but it is one of the most complex and time consuming units, its performance directly affects the efficiency of cipher algorithms. Combined with the characteristics of VLIW processor and matrix multiplication of symmetric cipher algorithms, this paper extracted the reconfigurable elements and analyzed the principle of matrix multiplication, then designed the reconfigurable architecture of matrix multiplication of VLIW processor further, at last we put forward single instructions for matrix multiplication between 4×1 and 4×4 matrix or two 4×4 matrix over GF(2~8), through the instructions extension, the instructions could support larger dimension operations. The experiment shows that the instructions we designed supports different dimensions matrix multiplication and improves the processing speed of multiplication greatly.展开更多
A detailed experiment of 1-pixel bit reconfigurable ternary optical processor (TOP) is proposed in the paper. 42 basic operation units (BOUs) and 28 typical logic operators of the TOP are realized in the experimen...A detailed experiment of 1-pixel bit reconfigurable ternary optical processor (TOP) is proposed in the paper. 42 basic operation units (BOUs) and 28 typical logic operators of the TOP are realized in the experiment. Results of the test cases elaborately cover the every combination of BOUs and all the nine inputs of ternary processor. Both the experiment process and results analysis are given in this paper. The experimental results demonstrate that the theory of reconfiguring a TOP is valid and that the reconfiguration circuitry is effective.展开更多
In order to eliminate the energy waste caused by the traditional static hardware multithreaded processor used in real-time embedded system working in the low workload situation, the energy efficiency of the hardware m...In order to eliminate the energy waste caused by the traditional static hardware multithreaded processor used in real-time embedded system working in the low workload situation, the energy efficiency of the hardware multithread is discussed and a novel dynamic multithreaded architecture is proposed. The proposed architecture saves the energy wasted by removing idle threads without manipulation on the original architecture, fulfills a seamless switching mechanism which protects active threads and avoids pipeline stall during power mode switching. The report of an implemented dynamic multithreaded processor with 45 nm process from synthesis tool indicates that the area of dynamic multithreaded architecture is only 2.27% higher than the static one in achieving dynamic power dissipation, and consumes 1.3% more power in the same peak performance.展开更多
The paper proposes a low power non-volatile baseband processor with wake-up identification(WUI) receiver for LR-WPAN transceiver.It consists of WUI receiver,main receiver,transmitter,non-volatile memory(NVM) and power...The paper proposes a low power non-volatile baseband processor with wake-up identification(WUI) receiver for LR-WPAN transceiver.It consists of WUI receiver,main receiver,transmitter,non-volatile memory(NVM) and power management module.The main receiver adopts a unified simplified synchronization method and channel codec with proactive Reed-Solomon Bypass technique,which increases the robustness and energy efficiency of receiver.The WUI receiver specifies the communication node and wakes up the transceiver to reduce average power consumption of the transceiver.The embedded NVM can backup/restore the states information of processor that avoids the loss of the state information caused by power failure and reduces the unnecessary power of repetitive computation when the processor is waked up from power down mode.The baseband processor is designed and verified on a FPGA board.The simulated power consumption of processor is 5.1uW for transmitting and 28.2μW for receiving.The WUI receiver technique reduces the average power consumption of transceiver remarkably.If the transceiver operates 30 seconds in every 15 minutes,the average power consumption of the transceiver can be reduced by two orders of magnitude.The NVM avoids the loss of the state information caused by power failure and energy waste caused by repetitive computation.展开更多
Under the direction of design space theory,in this paper we discuss the design of a superscalar pipelining using the way of multiple issues,and the implement of a superscalar based RISC DSP architecture,SDSP.Furthermo...Under the direction of design space theory,in this paper we discuss the design of a superscalar pipelining using the way of multiple issues,and the implement of a superscalar based RISC DSP architecture,SDSP.Furthermore,in this paper we discuss the validity of instruction prefetch,the branch prediction,the depth of instruction window and other issues that can affect the performance of superscalar DSP.展开更多
A kind of pseudo Gray code presentation of test patterns based on accumulation generators is presented and a low power test scheme is proposed to test computational function modules with contiguous subspace in very la...A kind of pseudo Gray code presentation of test patterns based on accumulation generators is presented and a low power test scheme is proposed to test computational function modules with contiguous subspace in very large scale integration (VLSI), especially in digital signal processors (DSP). If test patterns from accumulators for the modules are encoded in the pseudo Gray code presentation, the switching activities of the modules are reduced, and the decrease of the test power consumption is resulted in. Results of experimentation based on FPGA show that the test approach can reduce dynamic power consumption by an average of 17.40% for 8-bit ripple carry adder consisting of 3-2 counters. Then implementation of the low power test in hardware is exploited. Because of the reuse of adders, introduction of additional XOR logic gates is avoided successfully. The design minimizes additional hardware overhead for test and needs no adjustment of circuit structure. The low power test can detect any combinational stuck-at fault within the basic building block without any degradation of original circuit performance.展开更多
基金Sponsored by the National Defence Research Foundation of China(Grant No.413460303).
文摘Network processors are used in the core node of network to flexibly process packet streams. With the increase of performance, the power of network processor increases fast, and power and cooling become a bottleneck. Architecture-level power conscious design must go beyond low-level circuit design. Architectural power and performance tradeoff should be considered at the same time. Simulation is an efficient method to design modem network processor before making chip. In order to achieve the tradeoff between performance and power, the processor simulator is used to design the architecture of network processor. Using Netbeneh, Commubench benchmark and processor simulator-SimpleScalar, the performance and power of network processor are quantitatively evaluated. New performance tradeoff evaluation metric is proposed to analyze the architecture of network processor. Based on the high performance lnteI IXP 2800 Network processor eonfignration, optimized instruction fetch width and speed ,instruction issue width, instruction window size are analyzed and selected. Simulation resuits show that the tradeoff design method makes the usage of network processor more effectively. The optimal key parameters of network processor are important in architecture-level design. It is meaningful for the next generation network processor design.
基金supported in part by open project foundation of State Key Laboratory of Cryptology National Natural Science Foundation of China (NSFC) under Grant No. 61272492, No. 61572521 and No. 61309008Natural Science Foundation for Young of Shaanxi Province under Grant No. 2013JQ8013
文摘Matrix multiplication plays a pivotal role in the symmetric cipher algorithms, but it is one of the most complex and time consuming units, its performance directly affects the efficiency of cipher algorithms. Combined with the characteristics of VLIW processor and matrix multiplication of symmetric cipher algorithms, this paper extracted the reconfigurable elements and analyzed the principle of matrix multiplication, then designed the reconfigurable architecture of matrix multiplication of VLIW processor further, at last we put forward single instructions for matrix multiplication between 4×1 and 4×4 matrix or two 4×4 matrix over GF(2~8), through the instructions extension, the instructions could support larger dimension operations. The experiment shows that the instructions we designed supports different dimensions matrix multiplication and improves the processing speed of multiplication greatly.
基金Project supported by the National Natural Science Foundation of China(Grant No.61073049)the Shanghai Leading Academic Discipline Project(Grant No.J50103)the Doctorate Foundation of Education Ministry of China(Grant No.20093108110016)
文摘A detailed experiment of 1-pixel bit reconfigurable ternary optical processor (TOP) is proposed in the paper. 42 basic operation units (BOUs) and 28 typical logic operators of the TOP are realized in the experiment. Results of the test cases elaborately cover the every combination of BOUs and all the nine inputs of ternary processor. Both the experiment process and results analysis are given in this paper. The experimental results demonstrate that the theory of reconfiguring a TOP is valid and that the reconfiguration circuitry is effective.
基金supported partially by the National High Technical Research and Development Program of China (863 Program) under Grants No. 2011AA040101, No. 2008AA01Z134the National Natural Science Foundation of China under Grants No. 61003251, No. 61172049, No. 61173150+2 种基金the Doctoral Fund of Ministry of Education of China under Grant No. 20100006110015Beijing Municipal Natural Science Foundation under Grant No. Z111100054011078the 2012 Ladder Plan Project of Beijing Key Laboratory of Knowledge Engineering for Materials Science under Grant No. Z121101002812005
文摘In order to eliminate the energy waste caused by the traditional static hardware multithreaded processor used in real-time embedded system working in the low workload situation, the energy efficiency of the hardware multithread is discussed and a novel dynamic multithreaded architecture is proposed. The proposed architecture saves the energy wasted by removing idle threads without manipulation on the original architecture, fulfills a seamless switching mechanism which protects active threads and avoids pipeline stall during power mode switching. The report of an implemented dynamic multithreaded processor with 45 nm process from synthesis tool indicates that the area of dynamic multithreaded architecture is only 2.27% higher than the static one in achieving dynamic power dissipation, and consumes 1.3% more power in the same peak performance.
基金supported in part by the National Natural Science Foundation of China(No.61306027)
文摘The paper proposes a low power non-volatile baseband processor with wake-up identification(WUI) receiver for LR-WPAN transceiver.It consists of WUI receiver,main receiver,transmitter,non-volatile memory(NVM) and power management module.The main receiver adopts a unified simplified synchronization method and channel codec with proactive Reed-Solomon Bypass technique,which increases the robustness and energy efficiency of receiver.The WUI receiver specifies the communication node and wakes up the transceiver to reduce average power consumption of the transceiver.The embedded NVM can backup/restore the states information of processor that avoids the loss of the state information caused by power failure and reduces the unnecessary power of repetitive computation when the processor is waked up from power down mode.The baseband processor is designed and verified on a FPGA board.The simulated power consumption of processor is 5.1uW for transmitting and 28.2μW for receiving.The WUI receiver technique reduces the average power consumption of transceiver remarkably.If the transceiver operates 30 seconds in every 15 minutes,the average power consumption of the transceiver can be reduced by two orders of magnitude.The NVM avoids the loss of the state information caused by power failure and energy waste caused by repetitive computation.
文摘Under the direction of design space theory,in this paper we discuss the design of a superscalar pipelining using the way of multiple issues,and the implement of a superscalar based RISC DSP architecture,SDSP.Furthermore,in this paper we discuss the validity of instruction prefetch,the branch prediction,the depth of instruction window and other issues that can affect the performance of superscalar DSP.
基金supported by the National Natural Science Foundation of China under Grant No.90407007University Science Foundation of China under Grant No R0820207
文摘A kind of pseudo Gray code presentation of test patterns based on accumulation generators is presented and a low power test scheme is proposed to test computational function modules with contiguous subspace in very large scale integration (VLSI), especially in digital signal processors (DSP). If test patterns from accumulators for the modules are encoded in the pseudo Gray code presentation, the switching activities of the modules are reduced, and the decrease of the test power consumption is resulted in. Results of experimentation based on FPGA show that the test approach can reduce dynamic power consumption by an average of 17.40% for 8-bit ripple carry adder consisting of 3-2 counters. Then implementation of the low power test in hardware is exploited. Because of the reuse of adders, introduction of additional XOR logic gates is avoided successfully. The design minimizes additional hardware overhead for test and needs no adjustment of circuit structure. The low power test can detect any combinational stuck-at fault within the basic building block without any degradation of original circuit performance.