As a nanometer-level interconnection,the Optical Network-on-Chip(ONoC)was proposed since it was typically characterized by low latency,high bandwidth and power efficiency. Compared with a 2-Dimensional(2D)design,the 3...As a nanometer-level interconnection,the Optical Network-on-Chip(ONoC)was proposed since it was typically characterized by low latency,high bandwidth and power efficiency. Compared with a 2-Dimensional(2D)design,the 3D integration has the higher packing density and the shorter wire length. Therefore,the 3D ONoC will have the great potential in the future. In this paper,we first discuss the existing ONoC researches,and then design mesh and torus ONoCs from the perspectives of topology,router,and routing module,with the help of 3D integration. A simulation platform is established by using OPNET to compare the performance of 2D and 3D ONoCs in terms of average delay and packet loss rate. The performance comparison between 3D mesh and 3D torus ONoCs is also conducted. The simulation results demonstrate that 3D integration has the advantage of reducing average delay and packet loss rate,and 3D torus ONoC has the better performance compared with 3D mesh solution. Finally,we summarize some future challenges with possible solutions,including microcosmic routing inside optical routers and highly-efficient traffic grooming.展开更多
Modulating both the clock frequency and supply voltage of the network-on-chip (NoC) during runtime can reduce the power consumption and heat flux, but will lead to the increase of the latency of NoC. It is necessary...Modulating both the clock frequency and supply voltage of the network-on-chip (NoC) during runtime can reduce the power consumption and heat flux, but will lead to the increase of the latency of NoC. It is necessary to find a tradeoff between power consumption and communication latency. So we propose an analytical latency model which can show us the relationship of them. The proposed model to analyze latency is based on the M/G/1 queuing model, which is suitable for dynamic frequency scaling. The experiment results show that the accuracy of this model is more than 90%.展开更多
A variation-aware task mapping approach is proposed for a multi-core network-on-chips with redundant cores, which includes both the design-time mapping and run-time scheduling algorithms. Firstly, a design-time geneti...A variation-aware task mapping approach is proposed for a multi-core network-on-chips with redundant cores, which includes both the design-time mapping and run-time scheduling algorithms. Firstly, a design-time genetic task mapping algorithm is proposed during the design stage to generate multiple task mapping solutions which cover a maximum range of chips. Then, during the run, one optimal task mapping solution is selected. Additionally, logical cores are mapped to physically available cores. Both core asymmetry and topological changes are considered in the proposed approach. Experimental results show that the performance yield of the proposed approach is 96% on average, and the communication cost, power consumption and peak temperature are all optimized without loss of performance yield.展开更多
A dual-channel access mechanism to overcome the drawback of traditional single-channel access mechanism for network-on-chip (NoC) is proposed. In traditional single-channel access mechanism, every Internet protocol ...A dual-channel access mechanism to overcome the drawback of traditional single-channel access mechanism for network-on-chip (NoC) is proposed. In traditional single-channel access mechanism, every Internet protocol (IP) has only one chan- nel to access the on-chip network. When the network is relatively idle, the injection rate is too small to make good use of the network resource. When the network is relatively busy, the ejection rate is so small that the packets in the network cannot leave immediately, and thus the probability of congestion is increased. In the dual-channel access mechanism, the injection rate of IP and the ejection rate of the network are increased by using two optional channels in network interface (NI) and local port of routers. Therefore, the communication performance is improved. Experimental results show that compared with traditional single-channel access mechanism, the proposed scheme greatly increases the throughput and cuts down the average latency with reasonable area increase.展开更多
This paper introduces a new datapath architecture for reconfigurable processors. The proposed datapath is based on Network-on-Chip approach and facilitates tight coupling of all functional units. Reconfigurable functi...This paper introduces a new datapath architecture for reconfigurable processors. The proposed datapath is based on Network-on-Chip approach and facilitates tight coupling of all functional units. Reconfigurable functional elements can be dynamically allocated for application specific optimizations, enabling polymorphic computing. Using a modified network simulator, performance of several NoC topologies and parameters are investigated with standard benchmark programs, including fine grain and coarse grain computations. Simulation results highlight the flexibility and scalability of the proposed polymorphic NoC processor for a wide range of application domains.展开更多
With further increase of the number of on-chip device, the bus structure has not met the requirements. In order to make better communication between each part, the chip designers need to explore a new structure to sol...With further increase of the number of on-chip device, the bus structure has not met the requirements. In order to make better communication between each part, the chip designers need to explore a new structure to solve the interconnection of on-chip device. The paper proposes a network-on-chip dynamic and adaptive algorithm which selects NoC platform with 2-dimension mesh as the carrier, incorporates communication energy consumption and delay into unified cost function and uses ant colony optimization to realize NOC map facing energy consumption and delay. The experiment indicates that compared with random map, single objective optimization can separately saves (30% - 47 %) and ( 20% - 39%) in communication energy consumption and execution time compared with random map, and joint objective optimization can further excavate the potential of time dimension in mapping scheme dominated by the energy.展开更多
This paper introduces Twist-routing, a new routing algorithm for faulty on-chip networks, which improves Maze-routing, a face-routing based algorithm which uses deflections in routing, and archives full fault coverage...This paper introduces Twist-routing, a new routing algorithm for faulty on-chip networks, which improves Maze-routing, a face-routing based algorithm which uses deflections in routing, and archives full fault coverage and fast packet delivery. To build Twist-routing algorithm, we use bounding circles, which borrows the idea from GOAFR+ routing algorithm for ad-hoc wireless networks. Unlike Maze-routing, whose path length is unbounded even when the optimal path length is fixed, in Twist-routing, the path length is bounded by the cube of the optimal path length. Our evaluations show that Twist-routing algorithm delivers packets up to 35% faster than Maze-routing with a uniform traffic and Erdos-Rényi failure model, when the failure rate and the injection rate vary.展开更多
Network-on-Chip(NoC)is widely adopted in neuromorphic processors to support communication between neurons in spiking neural networks(SNNs).However,SNNs generate enormous spiking packets due to the one-to-many traffic ...Network-on-Chip(NoC)is widely adopted in neuromorphic processors to support communication between neurons in spiking neural networks(SNNs).However,SNNs generate enormous spiking packets due to the one-to-many traffic pattern.The spiking packets may cause communication pressure on NoC.We propose a path-based multicast routing method to alleviate the pressure.Firstly,all destination nodes of each source node on NoC are divided into several clusters.Secondly,multicast paths in the clusters are created based on the Hamiltonian path algorithm.The proposed routing can reduce the length of path and balance the communication load of each router.Lastly,we design a lightweight microarchitecture of NoC,which involves a customized multicast packet and a routing function.We use six datasets to verify the proposed multicast routing.Compared with unicast routing,the running time of path-based multicast routing achieves 5.1x speedup,and the number of hops and the maximum transmission latency of path-based multicast routing are reduced by 68.9%and 77.4%,respectively.The maximum length of path is reduced by 68.3%and 67.2%compared with the dual-path(DP)and multi-path(MP)multicast routing,respectively.Therefore,the proposed multicast routing has improved performance in terms of average latency and throughput compared with the DP or MP multicast routing.展开更多
First-Input-First-Output (FIFO) buffers are extensively used in contemporary digital processors and System-on-Chips (SoC). There are synchronous FIFOs and asycnrhonous FIFOs. And different sized FIFOs should be implem...First-Input-First-Output (FIFO) buffers are extensively used in contemporary digital processors and System-on-Chips (SoC). There are synchronous FIFOs and asycnrhonous FIFOs. And different sized FIFOs should be implemented in different ways. FIFOs are used not only for the pipeline design within a processor, for the inter-processor communication networks, for example Network-on-Chips (NoCs), but also for the peripherals and the clock domain crossing at the whole SoC level. In this paper, we review the interface, the circuit implementation, and the various usages of FIFOs in various levels of the digital design. We can find that the usage of FIFOs could greatly facilitate the signal storage, signal decoupling, signal transfer, power domain separation and power domain crossing in digital systems. We hope that more attentions are paid to the usages of synchronous and asynchronous FIFOs and more sophististicated usages are discovered by the digital design communities.展开更多
Network-on-chip (NoC) technology enables a new system-on-chip paradigm, the system-on- network-on-chip (SoNoC) paradigm. One of the challenges in designing application-specific networks is modeling the on-chip sys...Network-on-chip (NoC) technology enables a new system-on-chip paradigm, the system-on- network-on-chip (SoNoC) paradigm. One of the challenges in designing application-specific networks is modeling the on-chip system behavior and determining on-chip traffic characteristics. A universal object message level model for SoNoC was defined and an object-oriented methodology was developed to implement this model in hardware and software. The model supports "object to core" synthesis and "function invoking to network" mapping. A case study of an H.263 system verifies the model and methodology. System prototypes are easily built and on-chip traffic can be observed using the SoNoC model to provide real benchmarks for on-chip network design.展开更多
Dataflow architecture has shown its advantages in many high-performance computing cases. In dataflow computing, a large amount of data are frequently transferred among processing elements through the network-on-chip ...Dataflow architecture has shown its advantages in many high-performance computing cases. In dataflow computing, a large amount of data are frequently transferred among processing elements through the network-on-chip (NoC). Thus the router design has a significant impact on the performance of dataflow architecture. Common routers are designed for control-flow multi-core architecture and we find they are not suitable for dataflow architecture. In this work, we analyze and extract the features of data transfers in NoCs of dataflow architecture: multiple destinations, high injection rate, and performance sensitive to delay. Based on the three features, we propose a novel and efficient NoC router for dataflow architecture. The proposed router supports multi-destination; thus it can transfer data with multiple destinations in a single transfer. Moreover, the router adopts output buffer to maximize throughput and adopts non-flit packets to minimize transfer delay. Experimental results show that the proposed router can improve the performance of dataflow architecture by 3.6x over a state-of-the-art router.展开更多
Network-on-Chip (NoC) with excellent scalability and high bandwidth has been considered to be the most promising communication architecture for complex integration systems. However, NoC reliability is getting contin...Network-on-Chip (NoC) with excellent scalability and high bandwidth has been considered to be the most promising communication architecture for complex integration systems. However, NoC reliability is getting continuously challenging for the shrinking semiconductor feature size and increasing integration density. Moreover, a single node failure in NoC might destroy the network connectivity and corrupt the entire system. Introducing redundancies is an efficient method to construct a resilient communication path. However, prior work based on redundancies, either results in limited reliability with coarse grain protection or involves even larger hardware overhead with fine grain. In this paper, we notice that data path such as links, buffers and crossbars in NoC can be divided into multiple identical parallel slices, which can be utilized as inherent redundancy to enhance reliability. As long as there is one fault-free slice left available, the proposed salvaging scheme named as RevivePath, can be employed to make the overall data path still functional. Furthermore, RevivePath uses the direct redundancy to protect the control path such as switch arbiter, routing computation, to provide a full fault-tolerant scheme to the whole router. Experimental results show that it achieves quite high reliability with graceful performance degradation even under high fault rate.展开更多
Large transmission power consumptions and excessive interconnection lines are two shortcomings which exist in conventional network-on-chips. To improve performance in these areas, this paper proposes a full asynchrono...Large transmission power consumptions and excessive interconnection lines are two shortcomings which exist in conventional network-on-chips. To improve performance in these areas, this paper proposes a full asynchronous serial transmission converter for network-on-chips. By grouping the parallel data between routers into smaller data blocks, interconnection lines between routers can be greatly reduced, which finally brings about saving of power over- heads in the transmission process. Null convention logic units are used to make the circuit quasi-delay insensitive and highly robust. The proposed serial transmission converter and serial channel are implemented based on SMIC 0.18 μm standard CMOS technology. Results demonstrate that this full asynchronous serial transmission converter can save up to three quarters of the interconnection line resources and also reduce up to two-thirds of the power consumption under 32 bit data widths. The proposed full asynchronous serial transmission converter can apply to the on chip network which is sensitive to area and power.展开更多
With the shrink of the technology into nanometer scale, network-on-chip (NOC) has become a reasonable solution for connecting plenty of IP blocks on a single chip. But it suffers from both crosstalk effects and sing...With the shrink of the technology into nanometer scale, network-on-chip (NOC) has become a reasonable solution for connecting plenty of IP blocks on a single chip. But it suffers from both crosstalk effects and single event upset (SEU), especially crosstalk-induced delay, which may constrain the overall performance of NOC. In this paper, we introduce a reliable NOC design using a code with the capability of both crosstalk avoidance and single error correction. Such a code, named selected crosstalk avoidance code (SCAC) in our previous work, joins crosstalk avoidance code (CAC) and error correction code (ECC) together through codeword selection from an original CAC codeword set. It can handle possible error caused by either crosstalk effects or SEU. When designing a reliable NOC, data are encoded to SCAC codewords and can be transmitted rapidly and reliably across NOC. Experimental results show that the NOC design with SCAC achieves higher performance and is reliable to tolerate single errors. Compared with previous crosstalk avoidance methods, SCAC reduces wire overhead, power dissipation and the total delay. When SCAC is used in NOC, it can save 20% area overhead and reduce 49% power dissipation.展开更多
In order to ensure the reliability of network-on-chip (NoC) under faulty circumstance, a dynamic fault tolerant routing algorithm is proposed. This algorithm can implement detour routing when there are both static a...In order to ensure the reliability of network-on-chip (NoC) under faulty circumstance, a dynamic fault tolerant routing algorithm is proposed. This algorithm can implement detour routing when there are both static and dynamic permanent faults in the network. That means the packet is able to move around the fanlts to the destination with a non-minimum path. In addition, the multi-level congestion control mechanism gives the algorithm the ability to distribute the load over the whole network and to avoid hotspots around the faults. Simulation results demonstrate the advantage of the proposed routing algorithm in terms of average packet latency and packet loss rate compared with negative-first routing algo- rithm and DyAD routing algorithm in the presence of permanent faults. For the proposed algorithm, it can get much less average packet latency and lead to less than 20% packet loss rate.展开更多
To improve two shortcomings of conventional network-on-chips,i.e.low utilization rate in channels between routers and excessive interconnection lines,this paper proposes a full asynchronous self-adaptive bi-directiona...To improve two shortcomings of conventional network-on-chips,i.e.low utilization rate in channels between routers and excessive interconnection lines,this paper proposes a full asynchronous self-adaptive bi-directional transmission channel.It can utilize interconnection lines and register resources with high efficiency,and dynamically detect the data transmission state between routers through a direction regulator,which controls the sequencer to automatically adjust the transmission direction of the bi-directional channel,so as to provide a flexible data transmission environment.Null convention logic units are used to make the circuit quasi-delay insensitive and highly robust. The proposed bi-directional transmission channel is implemented based on SMIC 0.18μm standard CMOS technology. Post-layout simulation results demonstrate that this self-adaptive bi-directional channel has better performance on throughput,transmission flexibility and channel bandwidth utilization compared to a conventional single direction channel.Moreover,the proposed channel can save interconnection lines up to 30%and can provide twice the bandwidth resources of a single direction transmission channel.The proposed channel can apply to an on-chip network which has limited resources of registers and interconnection lines.展开更多
With the rapid development of semiconductor in- dustry, the number of cores integrated on chip increases quickly, which brings tough challenges such as bandwidth, scalability and power into on-chip interconnection. Un...With the rapid development of semiconductor in- dustry, the number of cores integrated on chip increases quickly, which brings tough challenges such as bandwidth, scalability and power into on-chip interconnection. Under such background, Network-on-Chip (NoC) is proposed and gradually replacing the traditional on-chip interconnections such as sharing bus and crossbar. For the convenience of physical layout, mesh is the most used topology in NoC design. Routing algorithm, which decides the paths of pack- ets, has significant impact on the latency and throughput of network. Thus routing algorithm plays a vital role in a wellperformed network. This study mainly :focuses on the routing algorithms of mesh NoC. By whether taking network information into consideration in routing decision, routing algorithms of NoC can be roughly classified into oblivious routing and adaptive routing. Oblivious routing costs less without adaptiveness while adaptive routing is on the contrary. To combine the advantages of oblivious and adaptive routing algorithm, half-adaptive algorithms were proposed. In this paper, the concepts, taxonomy and features of routing algorithms of NoC are introduced. Then the importance of routing algorithms in mesh NoC is highlighted, and representative routing algorithms with respective features are reviewed and summarized. Finally, we try to shed light upon the future work of NoC routing algorithms.展开更多
An analytical model is proposed for input buffer router architecture Network-on-Chip (NoC) with finite size buffers. The model is developed based on M/G/ 1/K queuing theory and takes into consideration the restricti...An analytical model is proposed for input buffer router architecture Network-on-Chip (NoC) with finite size buffers. The model is developed based on M/G/ 1/K queuing theory and takes into consideration the restriction of buffer sizes in NoC. It analyzes the packet's sojourn time in each buffer and calculates the packets average latency in NoC The validity of the model is verified through simulation. By comparing our analytical outcomes to the simulation results, we show that the proposed model successfully captures the performance characteristics of NoC, which provides an efficient performance analysis tool for NoC design.展开更多
To meet the demand for high on-chip network performance, flexible routing algorithms supplying path diversity and congestion alleviation are required. We propose a CAOE-FA router as a combination of congestionawarenes...To meet the demand for high on-chip network performance, flexible routing algorithms supplying path diversity and congestion alleviation are required. We propose a CAOE-FA router as a combination of congestionawareness and fair arbitration. Buffer occupancies from downstream neighbors are collected to indicate the congestion levels, among the candidate outputs permitted by the odd-even(OE) turn model, the lightest loaded direction is selected; fair arbitration is employed for the condition of the same congestion level to replace random selection. Experimental results show that the CAOE-FA can reduce the average packet latency by up to 22.18% and improve the network throughput by up to 68.58%, with ignorable price of hardware cost.展开更多
Low-loss, non-blocking, scalable passive optical interconnect network on-chip(LOOKNoC) structure was proposed based on 2×2 optical exchange switches, using wavelength division multiplexing(WDM)technology to expan...Low-loss, non-blocking, scalable passive optical interconnect network on-chip(LOOKNoC) structure was proposed based on 2×2 optical exchange switches, using wavelength division multiplexing(WDM)technology to expand to 8×8, 16×16, 32×32, 64×64 passive optical interconnection networks, which can achieve non-blocking communication. The experimental results show that based on the 16×16 optical interconnection network structure, the number of microring resonators(MRs) in LOOKNoC was reduced by 90.9%, 90.9%, 20.0% and 75.0% compared with the generic wavelength-routed optical router(GWOR), λ-router, topology and CrossBar structure. By testing the performance parameters based on the structure of 16×16 by the OMNET++ platform, as the result shows, the average insertion loss of LOOKNoC is 3.0%, 11.6%, 4.8% and 16.7% less than that of GWOR, λ-router, Mesh, and CrossBar structures.展开更多
基金supported in part by the National Nat-ural Science Foundation of China(Grant Nos.61401082,61471109,61502075,61672123,91438110,U1301253)the Fundamental Research Funds for Central Universities(Grant Nos.N161604004,N161608001,N150401002,DUT15RC(3)009)Liaoning Bai Qian Wan Talents Program,and National High-Level Personnel Special Support Program for Youth Top-Notch Talent
文摘As a nanometer-level interconnection,the Optical Network-on-Chip(ONoC)was proposed since it was typically characterized by low latency,high bandwidth and power efficiency. Compared with a 2-Dimensional(2D)design,the 3D integration has the higher packing density and the shorter wire length. Therefore,the 3D ONoC will have the great potential in the future. In this paper,we first discuss the existing ONoC researches,and then design mesh and torus ONoCs from the perspectives of topology,router,and routing module,with the help of 3D integration. A simulation platform is established by using OPNET to compare the performance of 2D and 3D ONoCs in terms of average delay and packet loss rate. The performance comparison between 3D mesh and 3D torus ONoCs is also conducted. The simulation results demonstrate that 3D integration has the advantage of reducing average delay and packet loss rate,and 3D torus ONoC has the better performance compared with 3D mesh solution. Finally,we summarize some future challenges with possible solutions,including microcosmic routing inside optical routers and highly-efficient traffic grooming.
基金supported by the National Natural Science Foundation of China under Grant No.61376024 and No.61306024Natural Science Foundation of Guangdong Province under Grant No.S2013040014366Basic Research Programme of Shenzhen No.JCYJ20140417113430642 and JCYJ20140901003939020
文摘Modulating both the clock frequency and supply voltage of the network-on-chip (NoC) during runtime can reduce the power consumption and heat flux, but will lead to the increase of the latency of NoC. It is necessary to find a tradeoff between power consumption and communication latency. So we propose an analytical latency model which can show us the relationship of them. The proposed model to analyze latency is based on the M/G/1 queuing model, which is suitable for dynamic frequency scaling. The experiment results show that the accuracy of this model is more than 90%.
文摘A variation-aware task mapping approach is proposed for a multi-core network-on-chips with redundant cores, which includes both the design-time mapping and run-time scheduling algorithms. Firstly, a design-time genetic task mapping algorithm is proposed during the design stage to generate multiple task mapping solutions which cover a maximum range of chips. Then, during the run, one optimal task mapping solution is selected. Additionally, logical cores are mapped to physically available cores. Both core asymmetry and topological changes are considered in the proposed approach. Experimental results show that the performance yield of the proposed approach is 96% on average, and the communication cost, power consumption and peak temperature are all optimized without loss of performance yield.
基金supported by the High Technology Research and Development Program of Fujian Province(2010HZ0004-1,2009HZ0003-1)
文摘A dual-channel access mechanism to overcome the drawback of traditional single-channel access mechanism for network-on-chip (NoC) is proposed. In traditional single-channel access mechanism, every Internet protocol (IP) has only one chan- nel to access the on-chip network. When the network is relatively idle, the injection rate is too small to make good use of the network resource. When the network is relatively busy, the ejection rate is so small that the packets in the network cannot leave immediately, and thus the probability of congestion is increased. In the dual-channel access mechanism, the injection rate of IP and the ejection rate of the network are increased by using two optional channels in network interface (NI) and local port of routers. Therefore, the communication performance is improved. Experimental results show that compared with traditional single-channel access mechanism, the proposed scheme greatly increases the throughput and cuts down the average latency with reasonable area increase.
文摘This paper introduces a new datapath architecture for reconfigurable processors. The proposed datapath is based on Network-on-Chip approach and facilitates tight coupling of all functional units. Reconfigurable functional elements can be dynamically allocated for application specific optimizations, enabling polymorphic computing. Using a modified network simulator, performance of several NoC topologies and parameters are investigated with standard benchmark programs, including fine grain and coarse grain computations. Simulation results highlight the flexibility and scalability of the proposed polymorphic NoC processor for a wide range of application domains.
文摘With further increase of the number of on-chip device, the bus structure has not met the requirements. In order to make better communication between each part, the chip designers need to explore a new structure to solve the interconnection of on-chip device. The paper proposes a network-on-chip dynamic and adaptive algorithm which selects NoC platform with 2-dimension mesh as the carrier, incorporates communication energy consumption and delay into unified cost function and uses ant colony optimization to realize NOC map facing energy consumption and delay. The experiment indicates that compared with random map, single objective optimization can separately saves (30% - 47 %) and ( 20% - 39%) in communication energy consumption and execution time compared with random map, and joint objective optimization can further excavate the potential of time dimension in mapping scheme dominated by the energy.
文摘This paper introduces Twist-routing, a new routing algorithm for faulty on-chip networks, which improves Maze-routing, a face-routing based algorithm which uses deflections in routing, and archives full fault coverage and fast packet delivery. To build Twist-routing algorithm, we use bounding circles, which borrows the idea from GOAFR+ routing algorithm for ad-hoc wireless networks. Unlike Maze-routing, whose path length is unbounded even when the optimal path length is fixed, in Twist-routing, the path length is bounded by the cube of the optimal path length. Our evaluations show that Twist-routing algorithm delivers packets up to 35% faster than Maze-routing with a uniform traffic and Erdos-Rényi failure model, when the failure rate and the injection rate vary.
基金supported by the National Key Research and Development Program of China under Grant Nos.2018YFB2202-603and2020AAA0104602.
文摘Network-on-Chip(NoC)is widely adopted in neuromorphic processors to support communication between neurons in spiking neural networks(SNNs).However,SNNs generate enormous spiking packets due to the one-to-many traffic pattern.The spiking packets may cause communication pressure on NoC.We propose a path-based multicast routing method to alleviate the pressure.Firstly,all destination nodes of each source node on NoC are divided into several clusters.Secondly,multicast paths in the clusters are created based on the Hamiltonian path algorithm.The proposed routing can reduce the length of path and balance the communication load of each router.Lastly,we design a lightweight microarchitecture of NoC,which involves a customized multicast packet and a routing function.We use six datasets to verify the proposed multicast routing.Compared with unicast routing,the running time of path-based multicast routing achieves 5.1x speedup,and the number of hops and the maximum transmission latency of path-based multicast routing are reduced by 68.9%and 77.4%,respectively.The maximum length of path is reduced by 68.3%and 67.2%compared with the dual-path(DP)and multi-path(MP)multicast routing,respectively.Therefore,the proposed multicast routing has improved performance in terms of average latency and throughput compared with the DP or MP multicast routing.
文摘First-Input-First-Output (FIFO) buffers are extensively used in contemporary digital processors and System-on-Chips (SoC). There are synchronous FIFOs and asycnrhonous FIFOs. And different sized FIFOs should be implemented in different ways. FIFOs are used not only for the pipeline design within a processor, for the inter-processor communication networks, for example Network-on-Chips (NoCs), but also for the peripherals and the clock domain crossing at the whole SoC level. In this paper, we review the interface, the circuit implementation, and the various usages of FIFOs in various levels of the digital design. We can find that the usage of FIFOs could greatly facilitate the signal storage, signal decoupling, signal transfer, power domain separation and power domain crossing in digital systems. We hope that more attentions are paid to the usages of synchronous and asynchronous FIFOs and more sophististicated usages are discovered by the digital design communities.
基金the National Natural Science Foundation of China (No. 60236020)the Specialized Research Fund for the Doctoral Program of Higher Education of Ministry of Education, China (No. 20050003083)
文摘Network-on-chip (NoC) technology enables a new system-on-chip paradigm, the system-on- network-on-chip (SoNoC) paradigm. One of the challenges in designing application-specific networks is modeling the on-chip system behavior and determining on-chip traffic characteristics. A universal object message level model for SoNoC was defined and an object-oriented methodology was developed to implement this model in hardware and software. The model supports "object to core" synthesis and "function invoking to network" mapping. A case study of an H.263 system verifies the model and methodology. System prototypes are easily built and on-chip traffic can be observed using the SoNoC model to provide real benchmarks for on-chip network design.
基金This work was supported by the National High Technology Research and Development 863 Program of China under Grant No. 2015AA01A301, the National Natural Science Foundation of China under Grant No. 61332009, the National HeGaoJi Project of China under Grant No. 2013ZX0102-8001-001-001, and the Beijing Municipal Science and Technology Commission under Grant Nos. Z15010101009 and Z151100003615006.
文摘Dataflow architecture has shown its advantages in many high-performance computing cases. In dataflow computing, a large amount of data are frequently transferred among processing elements through the network-on-chip (NoC). Thus the router design has a significant impact on the performance of dataflow architecture. Common routers are designed for control-flow multi-core architecture and we find they are not suitable for dataflow architecture. In this work, we analyze and extract the features of data transfers in NoCs of dataflow architecture: multiple destinations, high injection rate, and performance sensitive to delay. Based on the three features, we propose a novel and efficient NoC router for dataflow architecture. The proposed router supports multi-destination; thus it can transfer data with multiple destinations in a single transfer. Moreover, the router adopts output buffer to maximize throughput and adopts non-flit packets to minimize transfer delay. Experimental results show that the proposed router can improve the performance of dataflow architecture by 3.6x over a state-of-the-art router.
基金supported in part by the National Basic Research 973 Program of China under Grant No.2011CB302503the National Natural Science Foundation of China under Grant Nos.61076037,60906018,60921002
文摘Network-on-Chip (NoC) with excellent scalability and high bandwidth has been considered to be the most promising communication architecture for complex integration systems. However, NoC reliability is getting continuously challenging for the shrinking semiconductor feature size and increasing integration density. Moreover, a single node failure in NoC might destroy the network connectivity and corrupt the entire system. Introducing redundancies is an efficient method to construct a resilient communication path. However, prior work based on redundancies, either results in limited reliability with coarse grain protection or involves even larger hardware overhead with fine grain. In this paper, we notice that data path such as links, buffers and crossbars in NoC can be divided into multiple identical parallel slices, which can be utilized as inherent redundancy to enhance reliability. As long as there is one fault-free slice left available, the proposed salvaging scheme named as RevivePath, can be employed to make the overall data path still functional. Furthermore, RevivePath uses the direct redundancy to protect the control path such as switch arbiter, routing computation, to provide a full fault-tolerant scheme to the whole router. Experimental results show that it achieves quite high reliability with graceful performance degradation even under high fault rate.
基金supported by the National Natural Science Foundation of China(Nos.60676009,60725415,60971066,60803038)the National High-Tech Program of China(Nos.2009AA01Z258,2009AA01Z260).
文摘Large transmission power consumptions and excessive interconnection lines are two shortcomings which exist in conventional network-on-chips. To improve performance in these areas, this paper proposes a full asynchronous serial transmission converter for network-on-chips. By grouping the parallel data between routers into smaller data blocks, interconnection lines between routers can be greatly reduced, which finally brings about saving of power over- heads in the transmission process. Null convention logic units are used to make the circuit quasi-delay insensitive and highly robust. The proposed serial transmission converter and serial channel are implemented based on SMIC 0.18 μm standard CMOS technology. Results demonstrate that this full asynchronous serial transmission converter can save up to three quarters of the interconnection line resources and also reduce up to two-thirds of the power consumption under 32 bit data widths. The proposed full asynchronous serial transmission converter can apply to the on chip network which is sensitive to area and power.
基金supported in part by the National Natural Science Foundation of China (NSFC) under Grant Nos. 60606008,60633060, and 60776031the National Basic Research 973 Program of China under Grant No. 2005CB321604+1 种基金the National High Technology Research and Development 863 Program of China under Grant Nos. 2007AA01Z476, 2007AA01Z109 and 2007AA01Z113Co-Building Program of Beijing Municipal Education Commission
文摘With the shrink of the technology into nanometer scale, network-on-chip (NOC) has become a reasonable solution for connecting plenty of IP blocks on a single chip. But it suffers from both crosstalk effects and single event upset (SEU), especially crosstalk-induced delay, which may constrain the overall performance of NOC. In this paper, we introduce a reliable NOC design using a code with the capability of both crosstalk avoidance and single error correction. Such a code, named selected crosstalk avoidance code (SCAC) in our previous work, joins crosstalk avoidance code (CAC) and error correction code (ECC) together through codeword selection from an original CAC codeword set. It can handle possible error caused by either crosstalk effects or SEU. When designing a reliable NOC, data are encoded to SCAC codewords and can be transmitted rapidly and reliably across NOC. Experimental results show that the NOC design with SCAC achieves higher performance and is reliable to tolerate single errors. Compared with previous crosstalk avoidance methods, SCAC reduces wire overhead, power dissipation and the total delay. When SCAC is used in NOC, it can save 20% area overhead and reduce 49% power dissipation.
基金Supported by the National High Technology Research and Development Program of China (863 Program) (2002AA1Z149)
文摘In order to ensure the reliability of network-on-chip (NoC) under faulty circumstance, a dynamic fault tolerant routing algorithm is proposed. This algorithm can implement detour routing when there are both static and dynamic permanent faults in the network. That means the packet is able to move around the fanlts to the destination with a non-minimum path. In addition, the multi-level congestion control mechanism gives the algorithm the ability to distribute the load over the whole network and to avoid hotspots around the faults. Simulation results demonstrate the advantage of the proposed routing algorithm in terms of average packet latency and packet loss rate compared with negative-first routing algo- rithm and DyAD routing algorithm in the presence of permanent faults. For the proposed algorithm, it can get much less average packet latency and lead to less than 20% packet loss rate.
基金Project supported by the National Natural Science Foundation of China(Nos.60725415,60971066)the National High-Tech Research and Development Program of China(Nos.2009AA01Z258,2009AA01Z260)the National Science & Technology Important Project of China(No.2009ZX01034-002-001-005).
文摘To improve two shortcomings of conventional network-on-chips,i.e.low utilization rate in channels between routers and excessive interconnection lines,this paper proposes a full asynchronous self-adaptive bi-directional transmission channel.It can utilize interconnection lines and register resources with high efficiency,and dynamically detect the data transmission state between routers through a direction regulator,which controls the sequencer to automatically adjust the transmission direction of the bi-directional channel,so as to provide a flexible data transmission environment.Null convention logic units are used to make the circuit quasi-delay insensitive and highly robust. The proposed bi-directional transmission channel is implemented based on SMIC 0.18μm standard CMOS technology. Post-layout simulation results demonstrate that this self-adaptive bi-directional channel has better performance on throughput,transmission flexibility and channel bandwidth utilization compared to a conventional single direction channel.Moreover,the proposed channel can save interconnection lines up to 30%and can provide twice the bandwidth resources of a single direction transmission channel.The proposed channel can apply to an on-chip network which has limited resources of registers and interconnection lines.
文摘With the rapid development of semiconductor in- dustry, the number of cores integrated on chip increases quickly, which brings tough challenges such as bandwidth, scalability and power into on-chip interconnection. Under such background, Network-on-Chip (NoC) is proposed and gradually replacing the traditional on-chip interconnections such as sharing bus and crossbar. For the convenience of physical layout, mesh is the most used topology in NoC design. Routing algorithm, which decides the paths of pack- ets, has significant impact on the latency and throughput of network. Thus routing algorithm plays a vital role in a wellperformed network. This study mainly :focuses on the routing algorithms of mesh NoC. By whether taking network information into consideration in routing decision, routing algorithms of NoC can be roughly classified into oblivious routing and adaptive routing. Oblivious routing costs less without adaptiveness while adaptive routing is on the contrary. To combine the advantages of oblivious and adaptive routing algorithm, half-adaptive algorithms were proposed. In this paper, the concepts, taxonomy and features of routing algorithms of NoC are introduced. Then the importance of routing algorithms in mesh NoC is highlighted, and representative routing algorithms with respective features are reviewed and summarized. Finally, we try to shed light upon the future work of NoC routing algorithms.
文摘An analytical model is proposed for input buffer router architecture Network-on-Chip (NoC) with finite size buffers. The model is developed based on M/G/ 1/K queuing theory and takes into consideration the restriction of buffer sizes in NoC. It analyzes the packet's sojourn time in each buffer and calculates the packets average latency in NoC The validity of the model is verified through simulation. By comparing our analytical outcomes to the simulation results, we show that the proposed model successfully captures the performance characteristics of NoC, which provides an efficient performance analysis tool for NoC design.
基金Project supported by the National Natural Science Foundation of China(No.61625403)
文摘To meet the demand for high on-chip network performance, flexible routing algorithms supplying path diversity and congestion alleviation are required. We propose a CAOE-FA router as a combination of congestionawareness and fair arbitration. Buffer occupancies from downstream neighbors are collected to indicate the congestion levels, among the candidate outputs permitted by the odd-even(OE) turn model, the lightest loaded direction is selected; fair arbitration is employed for the condition of the same congestion level to replace random selection. Experimental results show that the CAOE-FA can reduce the average packet latency by up to 22.18% and improve the network throughput by up to 68.58%, with ignorable price of hardware cost.
基金supported by the National Natural Science Foundation of China (61834005, 61772417, 61874087)the Shaanxi International Science and Technology Cooperation Program (2018KW-006)。
文摘Low-loss, non-blocking, scalable passive optical interconnect network on-chip(LOOKNoC) structure was proposed based on 2×2 optical exchange switches, using wavelength division multiplexing(WDM)technology to expand to 8×8, 16×16, 32×32, 64×64 passive optical interconnection networks, which can achieve non-blocking communication. The experimental results show that based on the 16×16 optical interconnection network structure, the number of microring resonators(MRs) in LOOKNoC was reduced by 90.9%, 90.9%, 20.0% and 75.0% compared with the generic wavelength-routed optical router(GWOR), λ-router, topology and CrossBar structure. By testing the performance parameters based on the structure of 16×16 by the OMNET++ platform, as the result shows, the average insertion loss of LOOKNoC is 3.0%, 11.6%, 4.8% and 16.7% less than that of GWOR, λ-router, Mesh, and CrossBar structures.