A variation-aware task mapping approach is proposed for a multi-core network-on-chips with redundant cores, which includes both the design-time mapping and run-time scheduling algorithms. Firstly, a design-time geneti...A variation-aware task mapping approach is proposed for a multi-core network-on-chips with redundant cores, which includes both the design-time mapping and run-time scheduling algorithms. Firstly, a design-time genetic task mapping algorithm is proposed during the design stage to generate multiple task mapping solutions which cover a maximum range of chips. Then, during the run, one optimal task mapping solution is selected. Additionally, logical cores are mapped to physically available cores. Both core asymmetry and topological changes are considered in the proposed approach. Experimental results show that the performance yield of the proposed approach is 96% on average, and the communication cost, power consumption and peak temperature are all optimized without loss of performance yield.展开更多
As a nanometer-level interconnection,the Optical Network-on-Chip(ONoC)was proposed since it was typically characterized by low latency,high bandwidth and power efficiency. Compared with a 2-Dimensional(2D)design,the 3...As a nanometer-level interconnection,the Optical Network-on-Chip(ONoC)was proposed since it was typically characterized by low latency,high bandwidth and power efficiency. Compared with a 2-Dimensional(2D)design,the 3D integration has the higher packing density and the shorter wire length. Therefore,the 3D ONoC will have the great potential in the future. In this paper,we first discuss the existing ONoC researches,and then design mesh and torus ONoCs from the perspectives of topology,router,and routing module,with the help of 3D integration. A simulation platform is established by using OPNET to compare the performance of 2D and 3D ONoCs in terms of average delay and packet loss rate. The performance comparison between 3D mesh and 3D torus ONoCs is also conducted. The simulation results demonstrate that 3D integration has the advantage of reducing average delay and packet loss rate,and 3D torus ONoC has the better performance compared with 3D mesh solution. Finally,we summarize some future challenges with possible solutions,including microcosmic routing inside optical routers and highly-efficient traffic grooming.展开更多
This paper introduces Twist-routing, a new routing algorithm for faulty on-chip networks, which improves Maze-routing, a face-routing based algorithm which uses deflections in routing, and archives full fault coverage...This paper introduces Twist-routing, a new routing algorithm for faulty on-chip networks, which improves Maze-routing, a face-routing based algorithm which uses deflections in routing, and archives full fault coverage and fast packet delivery. To build Twist-routing algorithm, we use bounding circles, which borrows the idea from GOAFR+ routing algorithm for ad-hoc wireless networks. Unlike Maze-routing, whose path length is unbounded even when the optimal path length is fixed, in Twist-routing, the path length is bounded by the cube of the optimal path length. Our evaluations show that Twist-routing algorithm delivers packets up to 35% faster than Maze-routing with a uniform traffic and Erdos-Rényi failure model, when the failure rate and the injection rate vary.展开更多
Large transmission power consumptions and excessive interconnection lines are two shortcomings which exist in conventional network-on-chips. To improve performance in these areas, this paper proposes a full asynchrono...Large transmission power consumptions and excessive interconnection lines are two shortcomings which exist in conventional network-on-chips. To improve performance in these areas, this paper proposes a full asynchronous serial transmission converter for network-on-chips. By grouping the parallel data between routers into smaller data blocks, interconnection lines between routers can be greatly reduced, which finally brings about saving of power over- heads in the transmission process. Null convention logic units are used to make the circuit quasi-delay insensitive and highly robust. The proposed serial transmission converter and serial channel are implemented based on SMIC 0.18 μm standard CMOS technology. Results demonstrate that this full asynchronous serial transmission converter can save up to three quarters of the interconnection line resources and also reduce up to two-thirds of the power consumption under 32 bit data widths. The proposed full asynchronous serial transmission converter can apply to the on chip network which is sensitive to area and power.展开更多
Along with higher and higher integration of intellectual properties(IPs) on a single chip, traditional bus-based system-on-chips(So C) meets several design difficulties(such as low scalability, high power consumption,...Along with higher and higher integration of intellectual properties(IPs) on a single chip, traditional bus-based system-on-chips(So C) meets several design difficulties(such as low scalability, high power consumption,packet latency and clock tree problem). As a promising solution, network-on-chips(No C) has been proposed and widely studied. In this work, a novel algorithm for No C topology synthesis, which is decomposing and cluster refinement(DCR) algorithm, has been proposed to minimize the total power consumption of application-specific No C. This algorithm is composed of two stages: decomposing with cluster generation, and cluster refinement.For partitioning and cluster generation, an initial low-power solution for No C topology is generated. For cluster refinement, the clustering is optimized by performing floorplan to further reduce power consumption. Meanwhile,a good tradeoff between power consumption and CPU time can be achieved. Experimental results show that the proposed method outperforms the existing work.展开更多
To improve two shortcomings of conventional network-on-chips,i.e.low utilization rate in channels between routers and excessive interconnection lines,this paper proposes a full asynchronous self-adaptive bi-directiona...To improve two shortcomings of conventional network-on-chips,i.e.low utilization rate in channels between routers and excessive interconnection lines,this paper proposes a full asynchronous self-adaptive bi-directional transmission channel.It can utilize interconnection lines and register resources with high efficiency,and dynamically detect the data transmission state between routers through a direction regulator,which controls the sequencer to automatically adjust the transmission direction of the bi-directional channel,so as to provide a flexible data transmission environment.Null convention logic units are used to make the circuit quasi-delay insensitive and highly robust. The proposed bi-directional transmission channel is implemented based on SMIC 0.18μm standard CMOS technology. Post-layout simulation results demonstrate that this self-adaptive bi-directional channel has better performance on throughput,transmission flexibility and channel bandwidth utilization compared to a conventional single direction channel.Moreover,the proposed channel can save interconnection lines up to 30%and can provide twice the bandwidth resources of a single direction transmission channel.The proposed channel can apply to an on-chip network which has limited resources of registers and interconnection lines.展开更多
First-Input-First-Output (FIFO) buffers are extensively used in contemporary digital processors and System-on-Chips (SoC). There are synchronous FIFOs and asycnrhonous FIFOs. And different sized FIFOs should be implem...First-Input-First-Output (FIFO) buffers are extensively used in contemporary digital processors and System-on-Chips (SoC). There are synchronous FIFOs and asycnrhonous FIFOs. And different sized FIFOs should be implemented in different ways. FIFOs are used not only for the pipeline design within a processor, for the inter-processor communication networks, for example Network-on-Chips (NoCs), but also for the peripherals and the clock domain crossing at the whole SoC level. In this paper, we review the interface, the circuit implementation, and the various usages of FIFOs in various levels of the digital design. We can find that the usage of FIFOs could greatly facilitate the signal storage, signal decoupling, signal transfer, power domain separation and power domain crossing in digital systems. We hope that more attentions are paid to the usages of synchronous and asynchronous FIFOs and more sophististicated usages are discovered by the digital design communities.展开更多
Modulating both the clock frequency and supply voltage of the network-on-chip (NoC) during runtime can reduce the power consumption and heat flux, but will lead to the increase of the latency of NoC. It is necessary...Modulating both the clock frequency and supply voltage of the network-on-chip (NoC) during runtime can reduce the power consumption and heat flux, but will lead to the increase of the latency of NoC. It is necessary to find a tradeoff between power consumption and communication latency. So we propose an analytical latency model which can show us the relationship of them. The proposed model to analyze latency is based on the M/G/1 queuing model, which is suitable for dynamic frequency scaling. The experiment results show that the accuracy of this model is more than 90%.展开更多
A dual-channel access mechanism to overcome the drawback of traditional single-channel access mechanism for network-on-chip (NoC) is proposed. In traditional single-channel access mechanism, every Internet protocol ...A dual-channel access mechanism to overcome the drawback of traditional single-channel access mechanism for network-on-chip (NoC) is proposed. In traditional single-channel access mechanism, every Internet protocol (IP) has only one chan- nel to access the on-chip network. When the network is relatively idle, the injection rate is too small to make good use of the network resource. When the network is relatively busy, the ejection rate is so small that the packets in the network cannot leave immediately, and thus the probability of congestion is increased. In the dual-channel access mechanism, the injection rate of IP and the ejection rate of the network are increased by using two optional channels in network interface (NI) and local port of routers. Therefore, the communication performance is improved. Experimental results show that compared with traditional single-channel access mechanism, the proposed scheme greatly increases the throughput and cuts down the average latency with reasonable area increase.展开更多
This paper introduces a new datapath architecture for reconfigurable processors. The proposed datapath is based on Network-on-Chip approach and facilitates tight coupling of all functional units. Reconfigurable functi...This paper introduces a new datapath architecture for reconfigurable processors. The proposed datapath is based on Network-on-Chip approach and facilitates tight coupling of all functional units. Reconfigurable functional elements can be dynamically allocated for application specific optimizations, enabling polymorphic computing. Using a modified network simulator, performance of several NoC topologies and parameters are investigated with standard benchmark programs, including fine grain and coarse grain computations. Simulation results highlight the flexibility and scalability of the proposed polymorphic NoC processor for a wide range of application domains.展开更多
By benefiting from the development of the semiconductor technology, many-core system-on-chips(SoCs)have been widely used in electronic devices. Network-on-chips(NoCs) can address the massive stress of on-chip communic...By benefiting from the development of the semiconductor technology, many-core system-on-chips(SoCs)have been widely used in electronic devices. Network-on-chips(NoCs) can address the massive stress of on-chip communications due to the advantages of high bandwidth, low latency, and good flexibility. Since deep sub-micron era, the reliability has become a critical constraint for integrated circuits. To provide correct data transmission, faulttolerant NoCs have been researched widely in last decades, and many valuable designs have been proposed. This work introduces and summarizes the state-of-the-art technologies for fault diagnosis and fault recovery in faulttolerant NoCs. Moreover, this work makes prospects for the future’s research.展开更多
Among the components on a many-core chip, network-on-chip (NoC) has already contributed a large portion to overall power consumption. Optimizing NoC performance under a given power budget is further complicated to k...Among the components on a many-core chip, network-on-chip (NoC) has already contributed a large portion to overall power consumption. Optimizing NoC performance under a given power budget is further complicated to keep the network connectivity and minimize the detour distances. In this paper, a NoC power budgeting method from the communication perspective is proposed, which intelligently powers off routers/iinks and sets up alternative paths to restrict the power and thermal envelop. The effect of performance optimizaion of the proposed power budgeting mothod is measured based on latency and in the given power budget, 22% latency can be reduced averagely compared with some competing methods when running real benchmarks.展开更多
Traffic hijacking is a common attack perpetrated on networked systems, where attackers eavesdrop on user transactions, manipulate packet data, and divert traffic to illegitimate locations. Similar attacks can also be ...Traffic hijacking is a common attack perpetrated on networked systems, where attackers eavesdrop on user transactions, manipulate packet data, and divert traffic to illegitimate locations. Similar attacks can also be unleashed in a NoC (Network on Chip) based system where the NoC comes from a third-party vendor and can be engrafted with hardware Trojans. Unlike the attackers on a traditional network, those Trojans are usually small and have limited capacity. This paper targets such a hardware Trojan;Specifically, the Trojan aims to divert traffic packets to unauthorized locations on the NoC. To detect this kind of traffic hijacking, we propose an authentication scheme in which the source and destination addresses are tagged. We develop a custom design for the packet tagging and authentication such that the implementation costs can be greatly reduced. Our experiments on a set of applications show that on average the detection circuitry incurs about 3.37% overhead in area, 2.61% in power, and 0.097% in performance when compared to the baseline design.展开更多
Network-on-Chip(NoC)is widely adopted in neuromorphic processors to support communication between neurons in spiking neural networks(SNNs).However,SNNs generate enormous spiking packets due to the one-to-many traffic ...Network-on-Chip(NoC)is widely adopted in neuromorphic processors to support communication between neurons in spiking neural networks(SNNs).However,SNNs generate enormous spiking packets due to the one-to-many traffic pattern.The spiking packets may cause communication pressure on NoC.We propose a path-based multicast routing method to alleviate the pressure.Firstly,all destination nodes of each source node on NoC are divided into several clusters.Secondly,multicast paths in the clusters are created based on the Hamiltonian path algorithm.The proposed routing can reduce the length of path and balance the communication load of each router.Lastly,we design a lightweight microarchitecture of NoC,which involves a customized multicast packet and a routing function.We use six datasets to verify the proposed multicast routing.Compared with unicast routing,the running time of path-based multicast routing achieves 5.1x speedup,and the number of hops and the maximum transmission latency of path-based multicast routing are reduced by 68.9%and 77.4%,respectively.The maximum length of path is reduced by 68.3%and 67.2%compared with the dual-path(DP)and multi-path(MP)multicast routing,respectively.Therefore,the proposed multicast routing has improved performance in terms of average latency and throughput compared with the DP or MP multicast routing.展开更多
As low power consumption is the main design issue involved in a network on chip (NoC), researchers are concentrating more on both algorithms and architectural approaches. The conventional Dynamic Frequency Scalin...As low power consumption is the main design issue involved in a network on chip (NoC), researchers are concentrating more on both algorithms and architectural approaches. The conventional Dynamic Frequency Scaling (DFS) and history based Frequency Scaling (HDFS) algorithms are utilized to process the energy constrained data traffic. However, these conventional algorithms achieve higher energy efficiencies, and they result in performance degradation due to the auxiliary latency between clock domains. In this paper, we present a variable power optimization interface for NoC using a Finite State Machine (FSM) approach to attain better performance improvement. The parameters are estimated using 45 nm TSMCCMOS technology. In comparison with DFS system, the evaluation results show that FSM-DFS link achieves 81.55% dynamic power savings on the links in the on-chip network, and 37.5% leakage power savings of the link. Also, this proposed work is evaluated for various performance parameters and compared with conventional work. The simulation results are superior to conventional work.展开更多
A real time multiprocessor chip paradigm is also called a Network-on-Chip (NoC) which offers a promising architecture for future systems-on-chips. Even though a lot of Double Tail Sense Amplifiers (DTSA) are used in a...A real time multiprocessor chip paradigm is also called a Network-on-Chip (NoC) which offers a promising architecture for future systems-on-chips. Even though a lot of Double Tail Sense Amplifiers (DTSA) are used in architectural approach, the conventional DTSA with transceiver exhibits a difficulty of consuming more energy and latency than its intended design during heavy traffic condition. Variable Energy aware sense amplifier Link for Asynchronous NoC (VELAN) is designed in this research to eliminate the difficulty, which is the combination of Variable DTSA circuitry (V-DTSA) and Transceiver. The V-DTSA circuitry has following components such as bootable DTSA (B-DTSA) and bootable clock gating DTSA (BCG-DTSA), Graph theory based Traffic Estimator (GTE) and controller. Depending upon the traffic rate, the controller activates necessary DTSA modules and transfers information to the receiver. The proposed VELAN design is evaluated on TSMC 90 nm technology, showing 6.157 Gb/s data rate, 0.27 w total link power and 354 ps latency for single stage operation.展开更多
Dataflow architecture has shown its advantages in many high-performance computing cases. In dataflow computing, a large amount of data are frequently transferred among processing elements through the network-on-chip ...Dataflow architecture has shown its advantages in many high-performance computing cases. In dataflow computing, a large amount of data are frequently transferred among processing elements through the network-on-chip (NoC). Thus the router design has a significant impact on the performance of dataflow architecture. Common routers are designed for control-flow multi-core architecture and we find they are not suitable for dataflow architecture. In this work, we analyze and extract the features of data transfers in NoCs of dataflow architecture: multiple destinations, high injection rate, and performance sensitive to delay. Based on the three features, we propose a novel and efficient NoC router for dataflow architecture. The proposed router supports multi-destination; thus it can transfer data with multiple destinations in a single transfer. Moreover, the router adopts output buffer to maximize throughput and adopts non-flit packets to minimize transfer delay. Experimental results show that the proposed router can improve the performance of dataflow architecture by 3.6x over a state-of-the-art router.展开更多
Network-on-Chip (NoC) with excellent scalability and high bandwidth has been considered to be the most promising communication architecture for complex integration systems. However, NoC reliability is getting contin...Network-on-Chip (NoC) with excellent scalability and high bandwidth has been considered to be the most promising communication architecture for complex integration systems. However, NoC reliability is getting continuously challenging for the shrinking semiconductor feature size and increasing integration density. Moreover, a single node failure in NoC might destroy the network connectivity and corrupt the entire system. Introducing redundancies is an efficient method to construct a resilient communication path. However, prior work based on redundancies, either results in limited reliability with coarse grain protection or involves even larger hardware overhead with fine grain. In this paper, we notice that data path such as links, buffers and crossbars in NoC can be divided into multiple identical parallel slices, which can be utilized as inherent redundancy to enhance reliability. As long as there is one fault-free slice left available, the proposed salvaging scheme named as RevivePath, can be employed to make the overall data path still functional. Furthermore, RevivePath uses the direct redundancy to protect the control path such as switch arbiter, routing computation, to provide a full fault-tolerant scheme to the whole router. Experimental results show that it achieves quite high reliability with graceful performance degradation even under high fault rate.展开更多
With the shrink of the technology into nanometer scale, network-on-chip (NOC) has become a reasonable solution for connecting plenty of IP blocks on a single chip. But it suffers from both crosstalk effects and sing...With the shrink of the technology into nanometer scale, network-on-chip (NOC) has become a reasonable solution for connecting plenty of IP blocks on a single chip. But it suffers from both crosstalk effects and single event upset (SEU), especially crosstalk-induced delay, which may constrain the overall performance of NOC. In this paper, we introduce a reliable NOC design using a code with the capability of both crosstalk avoidance and single error correction. Such a code, named selected crosstalk avoidance code (SCAC) in our previous work, joins crosstalk avoidance code (CAC) and error correction code (ECC) together through codeword selection from an original CAC codeword set. It can handle possible error caused by either crosstalk effects or SEU. When designing a reliable NOC, data are encoded to SCAC codewords and can be transmitted rapidly and reliably across NOC. Experimental results show that the NOC design with SCAC achieves higher performance and is reliable to tolerate single errors. Compared with previous crosstalk avoidance methods, SCAC reduces wire overhead, power dissipation and the total delay. When SCAC is used in NOC, it can save 20% area overhead and reduce 49% power dissipation.展开更多
文摘A variation-aware task mapping approach is proposed for a multi-core network-on-chips with redundant cores, which includes both the design-time mapping and run-time scheduling algorithms. Firstly, a design-time genetic task mapping algorithm is proposed during the design stage to generate multiple task mapping solutions which cover a maximum range of chips. Then, during the run, one optimal task mapping solution is selected. Additionally, logical cores are mapped to physically available cores. Both core asymmetry and topological changes are considered in the proposed approach. Experimental results show that the performance yield of the proposed approach is 96% on average, and the communication cost, power consumption and peak temperature are all optimized without loss of performance yield.
基金supported in part by the National Nat-ural Science Foundation of China(Grant Nos.61401082,61471109,61502075,61672123,91438110,U1301253)the Fundamental Research Funds for Central Universities(Grant Nos.N161604004,N161608001,N150401002,DUT15RC(3)009)Liaoning Bai Qian Wan Talents Program,and National High-Level Personnel Special Support Program for Youth Top-Notch Talent
文摘As a nanometer-level interconnection,the Optical Network-on-Chip(ONoC)was proposed since it was typically characterized by low latency,high bandwidth and power efficiency. Compared with a 2-Dimensional(2D)design,the 3D integration has the higher packing density and the shorter wire length. Therefore,the 3D ONoC will have the great potential in the future. In this paper,we first discuss the existing ONoC researches,and then design mesh and torus ONoCs from the perspectives of topology,router,and routing module,with the help of 3D integration. A simulation platform is established by using OPNET to compare the performance of 2D and 3D ONoCs in terms of average delay and packet loss rate. The performance comparison between 3D mesh and 3D torus ONoCs is also conducted. The simulation results demonstrate that 3D integration has the advantage of reducing average delay and packet loss rate,and 3D torus ONoC has the better performance compared with 3D mesh solution. Finally,we summarize some future challenges with possible solutions,including microcosmic routing inside optical routers and highly-efficient traffic grooming.
文摘This paper introduces Twist-routing, a new routing algorithm for faulty on-chip networks, which improves Maze-routing, a face-routing based algorithm which uses deflections in routing, and archives full fault coverage and fast packet delivery. To build Twist-routing algorithm, we use bounding circles, which borrows the idea from GOAFR+ routing algorithm for ad-hoc wireless networks. Unlike Maze-routing, whose path length is unbounded even when the optimal path length is fixed, in Twist-routing, the path length is bounded by the cube of the optimal path length. Our evaluations show that Twist-routing algorithm delivers packets up to 35% faster than Maze-routing with a uniform traffic and Erdos-Rényi failure model, when the failure rate and the injection rate vary.
基金supported by the National Natural Science Foundation of China(Nos.60676009,60725415,60971066,60803038)the National High-Tech Program of China(Nos.2009AA01Z258,2009AA01Z260).
文摘Large transmission power consumptions and excessive interconnection lines are two shortcomings which exist in conventional network-on-chips. To improve performance in these areas, this paper proposes a full asynchronous serial transmission converter for network-on-chips. By grouping the parallel data between routers into smaller data blocks, interconnection lines between routers can be greatly reduced, which finally brings about saving of power over- heads in the transmission process. Null convention logic units are used to make the circuit quasi-delay insensitive and highly robust. The proposed serial transmission converter and serial channel are implemented based on SMIC 0.18 μm standard CMOS technology. Results demonstrate that this full asynchronous serial transmission converter can save up to three quarters of the interconnection line resources and also reduce up to two-thirds of the power consumption under 32 bit data widths. The proposed full asynchronous serial transmission converter can apply to the on chip network which is sensitive to area and power.
文摘Along with higher and higher integration of intellectual properties(IPs) on a single chip, traditional bus-based system-on-chips(So C) meets several design difficulties(such as low scalability, high power consumption,packet latency and clock tree problem). As a promising solution, network-on-chips(No C) has been proposed and widely studied. In this work, a novel algorithm for No C topology synthesis, which is decomposing and cluster refinement(DCR) algorithm, has been proposed to minimize the total power consumption of application-specific No C. This algorithm is composed of two stages: decomposing with cluster generation, and cluster refinement.For partitioning and cluster generation, an initial low-power solution for No C topology is generated. For cluster refinement, the clustering is optimized by performing floorplan to further reduce power consumption. Meanwhile,a good tradeoff between power consumption and CPU time can be achieved. Experimental results show that the proposed method outperforms the existing work.
基金Project supported by the National Natural Science Foundation of China(Nos.60725415,60971066)the National High-Tech Research and Development Program of China(Nos.2009AA01Z258,2009AA01Z260)the National Science & Technology Important Project of China(No.2009ZX01034-002-001-005).
文摘To improve two shortcomings of conventional network-on-chips,i.e.low utilization rate in channels between routers and excessive interconnection lines,this paper proposes a full asynchronous self-adaptive bi-directional transmission channel.It can utilize interconnection lines and register resources with high efficiency,and dynamically detect the data transmission state between routers through a direction regulator,which controls the sequencer to automatically adjust the transmission direction of the bi-directional channel,so as to provide a flexible data transmission environment.Null convention logic units are used to make the circuit quasi-delay insensitive and highly robust. The proposed bi-directional transmission channel is implemented based on SMIC 0.18μm standard CMOS technology. Post-layout simulation results demonstrate that this self-adaptive bi-directional channel has better performance on throughput,transmission flexibility and channel bandwidth utilization compared to a conventional single direction channel.Moreover,the proposed channel can save interconnection lines up to 30%and can provide twice the bandwidth resources of a single direction transmission channel.The proposed channel can apply to an on-chip network which has limited resources of registers and interconnection lines.
文摘First-Input-First-Output (FIFO) buffers are extensively used in contemporary digital processors and System-on-Chips (SoC). There are synchronous FIFOs and asycnrhonous FIFOs. And different sized FIFOs should be implemented in different ways. FIFOs are used not only for the pipeline design within a processor, for the inter-processor communication networks, for example Network-on-Chips (NoCs), but also for the peripherals and the clock domain crossing at the whole SoC level. In this paper, we review the interface, the circuit implementation, and the various usages of FIFOs in various levels of the digital design. We can find that the usage of FIFOs could greatly facilitate the signal storage, signal decoupling, signal transfer, power domain separation and power domain crossing in digital systems. We hope that more attentions are paid to the usages of synchronous and asynchronous FIFOs and more sophististicated usages are discovered by the digital design communities.
基金supported by the National Natural Science Foundation of China under Grant No.61376024 and No.61306024Natural Science Foundation of Guangdong Province under Grant No.S2013040014366Basic Research Programme of Shenzhen No.JCYJ20140417113430642 and JCYJ20140901003939020
文摘Modulating both the clock frequency and supply voltage of the network-on-chip (NoC) during runtime can reduce the power consumption and heat flux, but will lead to the increase of the latency of NoC. It is necessary to find a tradeoff between power consumption and communication latency. So we propose an analytical latency model which can show us the relationship of them. The proposed model to analyze latency is based on the M/G/1 queuing model, which is suitable for dynamic frequency scaling. The experiment results show that the accuracy of this model is more than 90%.
基金supported by the High Technology Research and Development Program of Fujian Province(2010HZ0004-1,2009HZ0003-1)
文摘A dual-channel access mechanism to overcome the drawback of traditional single-channel access mechanism for network-on-chip (NoC) is proposed. In traditional single-channel access mechanism, every Internet protocol (IP) has only one chan- nel to access the on-chip network. When the network is relatively idle, the injection rate is too small to make good use of the network resource. When the network is relatively busy, the ejection rate is so small that the packets in the network cannot leave immediately, and thus the probability of congestion is increased. In the dual-channel access mechanism, the injection rate of IP and the ejection rate of the network are increased by using two optional channels in network interface (NI) and local port of routers. Therefore, the communication performance is improved. Experimental results show that compared with traditional single-channel access mechanism, the proposed scheme greatly increases the throughput and cuts down the average latency with reasonable area increase.
文摘This paper introduces a new datapath architecture for reconfigurable processors. The proposed datapath is based on Network-on-Chip approach and facilitates tight coupling of all functional units. Reconfigurable functional elements can be dynamically allocated for application specific optimizations, enabling polymorphic computing. Using a modified network simulator, performance of several NoC topologies and parameters are investigated with standard benchmark programs, including fine grain and coarse grain computations. Simulation results highlight the flexibility and scalability of the proposed polymorphic NoC processor for a wide range of application domains.
基金supported by the National Natural Science Foundation of China under Grants No.61534002,and No.61701095the Fundamental Research Funds for the Central Universities under Grant No.ZYGX2016J042
文摘By benefiting from the development of the semiconductor technology, many-core system-on-chips(SoCs)have been widely used in electronic devices. Network-on-chips(NoCs) can address the massive stress of on-chip communications due to the advantages of high bandwidth, low latency, and good flexibility. Since deep sub-micron era, the reliability has become a critical constraint for integrated circuits. To provide correct data transmission, faulttolerant NoCs have been researched widely in last decades, and many valuable designs have been proposed. This work introduces and summarizes the state-of-the-art technologies for fault diagnosis and fault recovery in faulttolerant NoCs. Moreover, this work makes prospects for the future’s research.
基金supported by the National Natural Science Foundation of China under Grant No.61376024 and No.61306024Natural Science Foundation of Guangdong Province under Grant No.S2013040014366Basic Research Programme of Shenzhen No.JCYJ20140417113430642 and No.JCYJ20140901003939020
文摘Among the components on a many-core chip, network-on-chip (NoC) has already contributed a large portion to overall power consumption. Optimizing NoC performance under a given power budget is further complicated to keep the network connectivity and minimize the detour distances. In this paper, a NoC power budgeting method from the communication perspective is proposed, which intelligently powers off routers/iinks and sets up alternative paths to restrict the power and thermal envelop. The effect of performance optimizaion of the proposed power budgeting mothod is measured based on latency and in the given power budget, 22% latency can be reduced averagely compared with some competing methods when running real benchmarks.
文摘Traffic hijacking is a common attack perpetrated on networked systems, where attackers eavesdrop on user transactions, manipulate packet data, and divert traffic to illegitimate locations. Similar attacks can also be unleashed in a NoC (Network on Chip) based system where the NoC comes from a third-party vendor and can be engrafted with hardware Trojans. Unlike the attackers on a traditional network, those Trojans are usually small and have limited capacity. This paper targets such a hardware Trojan;Specifically, the Trojan aims to divert traffic packets to unauthorized locations on the NoC. To detect this kind of traffic hijacking, we propose an authentication scheme in which the source and destination addresses are tagged. We develop a custom design for the packet tagging and authentication such that the implementation costs can be greatly reduced. Our experiments on a set of applications show that on average the detection circuitry incurs about 3.37% overhead in area, 2.61% in power, and 0.097% in performance when compared to the baseline design.
基金supported by the National Key Research and Development Program of China under Grant Nos.2018YFB2202-603and2020AAA0104602.
文摘Network-on-Chip(NoC)is widely adopted in neuromorphic processors to support communication between neurons in spiking neural networks(SNNs).However,SNNs generate enormous spiking packets due to the one-to-many traffic pattern.The spiking packets may cause communication pressure on NoC.We propose a path-based multicast routing method to alleviate the pressure.Firstly,all destination nodes of each source node on NoC are divided into several clusters.Secondly,multicast paths in the clusters are created based on the Hamiltonian path algorithm.The proposed routing can reduce the length of path and balance the communication load of each router.Lastly,we design a lightweight microarchitecture of NoC,which involves a customized multicast packet and a routing function.We use six datasets to verify the proposed multicast routing.Compared with unicast routing,the running time of path-based multicast routing achieves 5.1x speedup,and the number of hops and the maximum transmission latency of path-based multicast routing are reduced by 68.9%and 77.4%,respectively.The maximum length of path is reduced by 68.3%and 67.2%compared with the dual-path(DP)and multi-path(MP)multicast routing,respectively.Therefore,the proposed multicast routing has improved performance in terms of average latency and throughput compared with the DP or MP multicast routing.
文摘As low power consumption is the main design issue involved in a network on chip (NoC), researchers are concentrating more on both algorithms and architectural approaches. The conventional Dynamic Frequency Scaling (DFS) and history based Frequency Scaling (HDFS) algorithms are utilized to process the energy constrained data traffic. However, these conventional algorithms achieve higher energy efficiencies, and they result in performance degradation due to the auxiliary latency between clock domains. In this paper, we present a variable power optimization interface for NoC using a Finite State Machine (FSM) approach to attain better performance improvement. The parameters are estimated using 45 nm TSMCCMOS technology. In comparison with DFS system, the evaluation results show that FSM-DFS link achieves 81.55% dynamic power savings on the links in the on-chip network, and 37.5% leakage power savings of the link. Also, this proposed work is evaluated for various performance parameters and compared with conventional work. The simulation results are superior to conventional work.
文摘A real time multiprocessor chip paradigm is also called a Network-on-Chip (NoC) which offers a promising architecture for future systems-on-chips. Even though a lot of Double Tail Sense Amplifiers (DTSA) are used in architectural approach, the conventional DTSA with transceiver exhibits a difficulty of consuming more energy and latency than its intended design during heavy traffic condition. Variable Energy aware sense amplifier Link for Asynchronous NoC (VELAN) is designed in this research to eliminate the difficulty, which is the combination of Variable DTSA circuitry (V-DTSA) and Transceiver. The V-DTSA circuitry has following components such as bootable DTSA (B-DTSA) and bootable clock gating DTSA (BCG-DTSA), Graph theory based Traffic Estimator (GTE) and controller. Depending upon the traffic rate, the controller activates necessary DTSA modules and transfers information to the receiver. The proposed VELAN design is evaluated on TSMC 90 nm technology, showing 6.157 Gb/s data rate, 0.27 w total link power and 354 ps latency for single stage operation.
基金This work was supported by the National High Technology Research and Development 863 Program of China under Grant No. 2015AA01A301, the National Natural Science Foundation of China under Grant No. 61332009, the National HeGaoJi Project of China under Grant No. 2013ZX0102-8001-001-001, and the Beijing Municipal Science and Technology Commission under Grant Nos. Z15010101009 and Z151100003615006.
文摘Dataflow architecture has shown its advantages in many high-performance computing cases. In dataflow computing, a large amount of data are frequently transferred among processing elements through the network-on-chip (NoC). Thus the router design has a significant impact on the performance of dataflow architecture. Common routers are designed for control-flow multi-core architecture and we find they are not suitable for dataflow architecture. In this work, we analyze and extract the features of data transfers in NoCs of dataflow architecture: multiple destinations, high injection rate, and performance sensitive to delay. Based on the three features, we propose a novel and efficient NoC router for dataflow architecture. The proposed router supports multi-destination; thus it can transfer data with multiple destinations in a single transfer. Moreover, the router adopts output buffer to maximize throughput and adopts non-flit packets to minimize transfer delay. Experimental results show that the proposed router can improve the performance of dataflow architecture by 3.6x over a state-of-the-art router.
基金supported in part by the National Basic Research 973 Program of China under Grant No.2011CB302503the National Natural Science Foundation of China under Grant Nos.61076037,60906018,60921002
文摘Network-on-Chip (NoC) with excellent scalability and high bandwidth has been considered to be the most promising communication architecture for complex integration systems. However, NoC reliability is getting continuously challenging for the shrinking semiconductor feature size and increasing integration density. Moreover, a single node failure in NoC might destroy the network connectivity and corrupt the entire system. Introducing redundancies is an efficient method to construct a resilient communication path. However, prior work based on redundancies, either results in limited reliability with coarse grain protection or involves even larger hardware overhead with fine grain. In this paper, we notice that data path such as links, buffers and crossbars in NoC can be divided into multiple identical parallel slices, which can be utilized as inherent redundancy to enhance reliability. As long as there is one fault-free slice left available, the proposed salvaging scheme named as RevivePath, can be employed to make the overall data path still functional. Furthermore, RevivePath uses the direct redundancy to protect the control path such as switch arbiter, routing computation, to provide a full fault-tolerant scheme to the whole router. Experimental results show that it achieves quite high reliability with graceful performance degradation even under high fault rate.
基金supported in part by the National Natural Science Foundation of China (NSFC) under Grant Nos. 60606008,60633060, and 60776031the National Basic Research 973 Program of China under Grant No. 2005CB321604+1 种基金the National High Technology Research and Development 863 Program of China under Grant Nos. 2007AA01Z476, 2007AA01Z109 and 2007AA01Z113Co-Building Program of Beijing Municipal Education Commission
文摘With the shrink of the technology into nanometer scale, network-on-chip (NOC) has become a reasonable solution for connecting plenty of IP blocks on a single chip. But it suffers from both crosstalk effects and single event upset (SEU), especially crosstalk-induced delay, which may constrain the overall performance of NOC. In this paper, we introduce a reliable NOC design using a code with the capability of both crosstalk avoidance and single error correction. Such a code, named selected crosstalk avoidance code (SCAC) in our previous work, joins crosstalk avoidance code (CAC) and error correction code (ECC) together through codeword selection from an original CAC codeword set. It can handle possible error caused by either crosstalk effects or SEU. When designing a reliable NOC, data are encoded to SCAC codewords and can be transmitted rapidly and reliably across NOC. Experimental results show that the NOC design with SCAC achieves higher performance and is reliable to tolerate single errors. Compared with previous crosstalk avoidance methods, SCAC reduces wire overhead, power dissipation and the total delay. When SCAC is used in NOC, it can save 20% area overhead and reduce 49% power dissipation.