To obtain a low-power and compact implementation of the advanced encryption standard (AES) S- box, an asynchronous pipeline architecture over composite field arithmetic was proposed in this paper. In the presented S...To obtain a low-power and compact implementation of the advanced encryption standard (AES) S- box, an asynchronous pipeline architecture over composite field arithmetic was proposed in this paper. In the presented S-box, some improvements were made as follows. (1) Level-sensitive latches were inserted in data path to block the propagation Of the dynamic hazards, which lowered the power of data path circuit. (2) Operations of latches were controlled by latch controllers based on presented asynchronous sequence element: LC-element, which utilized static asymmetric C-element to construct a simple and power-efficient circuit structure. (3) Implementation of the data path circuit was a semi-custom standard-cell circuit on 0.25μm complementary mental oxide semiconductor (CMOS) process; and the full-custom design methodology was adopted in the handshake circuit design. Experimental results show that the resulting circuit achieves nearly 46% improvement with moderate area penalty ( 11.7% ) compared with the related composite field S-box in power performance. The presented S-box circuit can be a hardware intelli-gent property (IP) embedded in the targeted systems such as wireless sensor networks (WSN), smart-cards and radio frequency identification (RFID).展开更多
This paper proposes a circuit structure which can be used for both synchronous and asynchronous pipeline control. It is a self-circulation structure with embedded delay network, and a pipeline can be controlled by thi...This paper proposes a circuit structure which can be used for both synchronous and asynchronous pipeline control. It is a self-circulation structure with embedded delay network, and a pipeline can be controlled by this structure with the interconnection of adjacent stages. This paper first proposes a basic circuit structure, then a linear pipeline is designed with self-circulation structure. The performance of linear pipeline is analyzed, and a 16-bit digital signal processor (DSP) with the structure is designed to prove the validity of the structure. Results show that about 10%-15% power consumption is saved with self-circulation structure compared with synchronous counterpart, while almost the same performance is achieved.展开更多
This paper proposes a new optimization method to improve the performance of a null convention logic asynchronous pipeline.Parallel combinational logic modules in the pipelines can work alternately in null and data cyc...This paper proposes a new optimization method to improve the performance of a null convention logic asynchronous pipeline.Parallel combinational logic modules in the pipelines can work alternately in null and data cycles by using a parallel processing mode.The complete waiting time for both null and data signals of combinational logic output in previous asynchronous register stage is reduced by decoupling the output from combinational logic modules.Performance penalty brought by null cycle is reduced while the data processing capacity is increased.The novel asynchronous pipeline based on asynchronous full adders with different bit widths as asynchronous combination logic modules is simulated using 0.18-μm CMOS technology.Based on 6 bits asynchronous adder as asynchronous combination logic modules, the simulation result of this new pipeline proposal demonstrates a high throughput up to 72.4% improvement with appropriate power consumption.This indicates the new design proposal is preferable for high-speed as ynchronous designs due to its high throughput and delay-insensitivity.展开更多
This paper proposes an asynchronous complex pipeline based on ARM-V3 instruction set. Muller pipeline structure is used as prototype, and the factors which may affect pipeline performance are analyzed. To balance the ...This paper proposes an asynchronous complex pipeline based on ARM-V3 instruction set. Muller pipeline structure is used as prototype, and the factors which may affect pipeline performance are analyzed. To balance the difficulty of asynchronous design and performance analysis, both complete asynchronous and partial asynchronous structures aere designed and compared. Results of comparison with the well-Rnown industrial product ARM922T verify that about 30% and 40% performance improvement of the partiM and complete asynchronous complex pipelines can be obtained respectively. The design methodologies can also be used in the design of other asynchronous pipelines.展开更多
As VLSI technology enters the post-Moore era, there has been an increasing interest in asynchronous design because of its potential advantages in power consumption, electromagnetic emission, and automatic speed scalin...As VLSI technology enters the post-Moore era, there has been an increasing interest in asynchronous design because of its potential advantages in power consumption, electromagnetic emission, and automatic speed scaling capacity under supply voltage variations. In most practical asynchronous circuits, a pipeline forms the micro-architecture backbone, and its characteristics play a vital role in determining the overall circuit performance.In this paper, we investigate a series of typical asynchronous pipeline circuits based on bundled-data encoding,spanning different handshake signaling protocols such as 2-phase(micropipeline, Mousetrap, and Click), 4-phase(simple, semi-decoupled, and fully-decoupled), and single-track(GasP). An in-depth review of each selected circuit is conducted regarding the handshaking and data latching mechanisms behind the circuit implementations, as well as the analysis of its performance and timing constraints based on formal behavior models. Overall, this paper aims at providing a survey of asynchronous bundled-data pipeline circuits, and it will be a reference for designers interested in experimenting with asynchronous circuits.展开更多
Recently, a method known as pipeline stage unification (PSU) has been proposed to alleviate the increasing energy consumption problem in modern microprocessors. PSU achieves a high energy efficiency by employing a c...Recently, a method known as pipeline stage unification (PSU) has been proposed to alleviate the increasing energy consumption problem in modern microprocessors. PSU achieves a high energy efficiency by employing a changeable pipeline depth and its working scheme is eligible for a fine control method. In this paper, we propose a dynamic method to study fine-grained program interval behaviors based on some easy-to-get runtime processor metrics. Using this method to determine the proper PSU configurations during the program execution, we are able to achieve an averaged 13.5% energydelay-product (EDP) reduction for SPEC CPU2000 integer benchmarks, compared to the baseline processor. This value is only 0.14% larger than the theoretically idealized controlling. Our hardware synthesis result indicates that the proposed method can largely decrease the hardware overhead in both area and delay costs, as compared to a previous program study method which is based on working set signatures.展开更多
基金the National High Technology Research and Development Programme of China(Grant No2006AA01Z226)the Project(Grant No2006Z001B)the Scientific Research Foundation of Huazhong University of Science and Technology
文摘To obtain a low-power and compact implementation of the advanced encryption standard (AES) S- box, an asynchronous pipeline architecture over composite field arithmetic was proposed in this paper. In the presented S-box, some improvements were made as follows. (1) Level-sensitive latches were inserted in data path to block the propagation Of the dynamic hazards, which lowered the power of data path circuit. (2) Operations of latches were controlled by latch controllers based on presented asynchronous sequence element: LC-element, which utilized static asymmetric C-element to construct a simple and power-efficient circuit structure. (3) Implementation of the data path circuit was a semi-custom standard-cell circuit on 0.25μm complementary mental oxide semiconductor (CMOS) process; and the full-custom design methodology was adopted in the handshake circuit design. Experimental results show that the resulting circuit achieves nearly 46% improvement with moderate area penalty ( 11.7% ) compared with the related composite field S-box in power performance. The presented S-box circuit can be a hardware intelli-gent property (IP) embedded in the targeted systems such as wireless sensor networks (WSN), smart-cards and radio frequency identification (RFID).
基金Sponsored by the National High Technology Research and Development Program of China(Grant No.2003AA1Z1350)
文摘This paper proposes a circuit structure which can be used for both synchronous and asynchronous pipeline control. It is a self-circulation structure with embedded delay network, and a pipeline can be controlled by this structure with the interconnection of adjacent stages. This paper first proposes a basic circuit structure, then a linear pipeline is designed with self-circulation structure. The performance of linear pipeline is analyzed, and a 16-bit digital signal processor (DSP) with the structure is designed to prove the validity of the structure. Results show that about 10%-15% power consumption is saved with self-circulation structure compared with synchronous counterpart, while almost the same performance is achieved.
基金supported by the National Science Fund for Distinguished Young Scholars (No. 60725415)the National Natural Science Foundation of China (Nos. 60676009, 90407016)
文摘This paper proposes a new optimization method to improve the performance of a null convention logic asynchronous pipeline.Parallel combinational logic modules in the pipelines can work alternately in null and data cycles by using a parallel processing mode.The complete waiting time for both null and data signals of combinational logic output in previous asynchronous register stage is reduced by decoupling the output from combinational logic modules.Performance penalty brought by null cycle is reduced while the data processing capacity is increased.The novel asynchronous pipeline based on asynchronous full adders with different bit widths as asynchronous combination logic modules is simulated using 0.18-μm CMOS technology.Based on 6 bits asynchronous adder as asynchronous combination logic modules, the simulation result of this new pipeline proposal demonstrates a high throughput up to 72.4% improvement with appropriate power consumption.This indicates the new design proposal is preferable for high-speed as ynchronous designs due to its high throughput and delay-insensitivity.
基金the Research Project of China Military Department (No. 6130325)
文摘This paper proposes an asynchronous complex pipeline based on ARM-V3 instruction set. Muller pipeline structure is used as prototype, and the factors which may affect pipeline performance are analyzed. To balance the difficulty of asynchronous design and performance analysis, both complete asynchronous and partial asynchronous structures aere designed and compared. Results of comparison with the well-Rnown industrial product ARM922T verify that about 30% and 40% performance improvement of the partiM and complete asynchronous complex pipelines can be obtained respectively. The design methodologies can also be used in the design of other asynchronous pipelines.
基金supported in part by the Hainan Academician Innovation Platform (No. YSPTZX202036)in part by the Hainan Natural Science Foundation (No. 619MS054)。
文摘As VLSI technology enters the post-Moore era, there has been an increasing interest in asynchronous design because of its potential advantages in power consumption, electromagnetic emission, and automatic speed scaling capacity under supply voltage variations. In most practical asynchronous circuits, a pipeline forms the micro-architecture backbone, and its characteristics play a vital role in determining the overall circuit performance.In this paper, we investigate a series of typical asynchronous pipeline circuits based on bundled-data encoding,spanning different handshake signaling protocols such as 2-phase(micropipeline, Mousetrap, and Click), 4-phase(simple, semi-decoupled, and fully-decoupled), and single-track(GasP). An in-depth review of each selected circuit is conducted regarding the handshaking and data latching mechanisms behind the circuit implementations, as well as the analysis of its performance and timing constraints based on formal behavior models. Overall, this paper aims at providing a survey of asynchronous bundled-data pipeline circuits, and it will be a reference for designers interested in experimenting with asynchronous circuits.
基金supported by VLSI Design and Education Center (VDEC),University of Tokyo with the collaboration with Synopsys Corporation
文摘Recently, a method known as pipeline stage unification (PSU) has been proposed to alleviate the increasing energy consumption problem in modern microprocessors. PSU achieves a high energy efficiency by employing a changeable pipeline depth and its working scheme is eligible for a fine control method. In this paper, we propose a dynamic method to study fine-grained program interval behaviors based on some easy-to-get runtime processor metrics. Using this method to determine the proper PSU configurations during the program execution, we are able to achieve an averaged 13.5% energydelay-product (EDP) reduction for SPEC CPU2000 integer benchmarks, compared to the baseline processor. This value is only 0.14% larger than the theoretically idealized controlling. Our hardware synthesis result indicates that the proposed method can largely decrease the hardware overhead in both area and delay costs, as compared to a previous program study method which is based on working set signatures.