The first path-independent insertion-loss(PILOSS) strictly non-blocking 4×4 silicon electro–optic switch matrix is reported. The footprint of this switch matrix is only 4.6 mm×1.0 mm. Using single-arm mod...The first path-independent insertion-loss(PILOSS) strictly non-blocking 4×4 silicon electro–optic switch matrix is reported. The footprint of this switch matrix is only 4.6 mm×1.0 mm. Using single-arm modulation, the crosstalk measured in this test is-13 dB --27 dB. And a maximum crosstalk deterioration of 6d B caused by two-path interference is also found.展开更多
Electric router is widely used for multi-core system to interconnect each other. However, with the increasing number of processor cores, the probability of communication conflict between processor cores increases, and...Electric router is widely used for multi-core system to interconnect each other. However, with the increasing number of processor cores, the probability of communication conflict between processor cores increases, and the data delay increases dramatically. With the advent of optical router, the traditional electrical interconnection mode has changed to optical interconnection mode. In the packet switched optical interconnection network, the data communication mechanism consists of 3 processes: link establishment, data transmission and link termination, but the circuit-switched data transmission method greatly limits the utilization of resources. The number of micro-ring resonators in the on-chip large-scale optical interconnect network is an important parameter affecting the insertion loss. The proposed λ-route, GWOR, Crossbar structure has a large overall network insertion loss due to the use of many micro-ring resonators. How to use the least micro-ring resonator to realize non-blocking communication between multiple cores has been a research hotspot. In order to improve bandwidth and reduce access latency, an optical interconnection structure called multilevel switching optical network on chip(MSONoC) is proposed in this paper. The broadband micro-ring resonators(BMRs) are employed to reduce the number of micro-ring resonators(MRs) in the network, and the structure can provide the service of non-blocking point to point communication with the wavelength division multiplexing(WDM) technology. The results show that compared to λ-route, GWOR, Crossbar and the new topology structure, the number of micro-ring resonators of MSONoC are reduced by 95.5%, 95.5%, 87.5%, and 60% respectively. The insertion loss of the minimum link of new topology, mesh and MSONoC structure is 0.73 dB, 0.725 dB and 0.38 dB.展开更多
In data streams or web scenarios at highly variable and unpredictable rates, a good join algorithm should be able to "hide" the delays by continuing to output join results. The non-blocking algorithms allow some tup...In data streams or web scenarios at highly variable and unpredictable rates, a good join algorithm should be able to "hide" the delays by continuing to output join results. The non-blocking algorithms allow some tuples to be flushed onto disk, with the goal of producing results continuously when data transmission is suspended. But state-of-the-art algorithms have trouble with the constraint of allocated memory. To make better use of memory, a novel non-blocking join algorithm based on hash-merge for improving query response times is proposed. The reduced data structure of in-memory tuples helps to improve memory utility. A replacement selection tree is applied to adjust memory by expanding or shrinking the size of the tree and separates one external join transaction into multi-subtasks. In addition, a cost model to estimate task output rate is proposed to select the in-disk portion that promises to produce the fastest results in the external join stage. Experiments show that the technique, with far less memory, delivers results faster than the three non-blocking join algorithms ( XJoin, HMJ and RPJ ) , with up to almost two-fold improvement in reliable network and one order of magnitude improvement in unreliable network in terms of the number of the reported tuples.展开更多
We report on the first monolithically integrated microring-based optical switch in the switch-and-select architecture. The switch fabric delivers strictly non-blocking connectivity while completely canceling the first...We report on the first monolithically integrated microring-based optical switch in the switch-and-select architecture. The switch fabric delivers strictly non-blocking connectivity while completely canceling the first-order crosstalk. The 4 × 4 switching circuit consists of eight silicon microring-based spatial(de-)multiplexers interconnected by a Si/SiN dual-layer crossing-free central shuffle. Analysis of the on-state and off-state power transfer functions reveals the extinction ratios of individual ring resonators exceeding 25 dB, leading to switch crosstalk suppression of up to over 50 dB in the switch-and-select topology. Optical paths are assessed, showing losses as low as 0.1 dB per off-resonance ring and 0.5 dB per on-resonance ring. Photonic switching is actuated with integrated micro-heaters to give an ~24 GHz passband. The fully packaged device is flip-chip bonded onto a printed circuit board breakout board with a UV-curved fiber array.展开更多
Low-loss, non-blocking, scalable passive optical interconnect network on-chip(LOOKNoC) structure was proposed based on 2×2 optical exchange switches, using wavelength division multiplexing(WDM)technology to expan...Low-loss, non-blocking, scalable passive optical interconnect network on-chip(LOOKNoC) structure was proposed based on 2×2 optical exchange switches, using wavelength division multiplexing(WDM)technology to expand to 8×8, 16×16, 32×32, 64×64 passive optical interconnection networks, which can achieve non-blocking communication. The experimental results show that based on the 16×16 optical interconnection network structure, the number of microring resonators(MRs) in LOOKNoC was reduced by 90.9%, 90.9%, 20.0% and 75.0% compared with the generic wavelength-routed optical router(GWOR), λ-router, topology and CrossBar structure. By testing the performance parameters based on the structure of 16×16 by the OMNET++ platform, as the result shows, the average insertion loss of LOOKNoC is 3.0%, 11.6%, 4.8% and 16.7% less than that of GWOR, λ-router, Mesh, and CrossBar structures.展开更多
Message total ordering is a critical part in active replication in order to maintain consistency among members in a fault tolerant group. The paper proposes a non-blocking message total ordering protocol (NBTOP) for...Message total ordering is a critical part in active replication in order to maintain consistency among members in a fault tolerant group. The paper proposes a non-blocking message total ordering protocol (NBTOP) for distributed systems. Non-blocking property refers to that the members in a fault tolerant group keep on running independently without waiting for installing the same group view when a fault tolerant group evolves even when decision messages collide. NBTOP takes advantage of token ring as its logical control way. Members adopt re-requesting mechanism (RR) to obtain their lost decisions. Forward acknowledgement mechanism (FA) is put forth to solve decision collisions. The paper further proves that NBTOP satisfies the properties of total order, agreement, and termination. NBTOP is implemented, and its performance test is done. Comparing with the performance of Totem, the results show that NBTOP has a better total ordering delay. It manifests that non-blocking property helps to improve protocol efficiency.展开更多
A sorting algorithm based on the Batcher' s algorithm is presented. An 8X8multistage interconnection network(MIN) is constructed. Applying wavelength division multiplexing(WDM) technology and integrating control m...A sorting algorithm based on the Batcher' s algorithm is presented. An 8X8multistage interconnection network(MIN) is constructed. Applying wavelength division multiplexing(WDM) technology and integrating control mode, the designed network can realize non-blockingcommunication. The time delay of the MIN and the switches needed are also analyzed in theory, thededuced result conforms that the MIN designed previously is feasible. In the case of the samecommunication quality guaranteed, MIN uses the least switches and completes the communication moreefficiently.展开更多
The Godson project is the first attempt to design high performancegeneral-purpose microprocessors in China. This paper introduces the microarchitecture of theGodson-2 processor which is a 64-bit, 4-issue, out-of-order...The Godson project is the first attempt to design high performancegeneral-purpose microprocessors in China. This paper introduces the microarchitecture of theGodson-2 processor which is a 64-bit, 4-issue, out-of-order execution RISC processor that implementsthe 64-bit MIPS-like instruction set. The adoption of the aggressive out-of-order executiontechniques (such as register mapping, branch prediction, and dynamic scheduling) and cachetechniques (such as non-blocking cache, load speculation, dynamic memory disambiguation) helps theGodson-2 processor to achieve high performance even at not so high frequency. The Godson-2 processorhas been physically implemented on a 6-metal 0.18 μm CMOS technology based on the automaticplacing and routing flow with the help of some crafted library cells and macros. The area of thechip is 6,700 micrometers by 6,200 micrometers and the clock cycle at typical corner is 2.3 ns.展开更多
This paper introduces the microarchitecture and physical implementation of the Godson-2E processor, which is a four-issue superscalar RISC processor that supports the 64-bit MIPS instruction set. The adoption of the a...This paper introduces the microarchitecture and physical implementation of the Godson-2E processor, which is a four-issue superscalar RISC processor that supports the 64-bit MIPS instruction set. The adoption of the aggressive out-of-order execution and memory hierarchy techniques help Godson-2E to achieve high performance. The Godson-2E processor has been physically designed in a 7-metal 90nm CMOS process using the cell-based methodology with some bitsliced manual placement and a number of crafted cells and macros. The processor can be run at 1GHz and achieves a SPEC CPU2000 rate higher than 500.展开更多
基金Project supported by the National Basic Research Program of China(Grant No.2011CB301701)the National High Technology Research and Development Program of China(Grant Nos.2013AA014402+2 种基金2012AA012202and 2015AA016904)the National Natural Science Foundation of China(Grant Nos.61275065 and 61107048)
文摘The first path-independent insertion-loss(PILOSS) strictly non-blocking 4×4 silicon electro–optic switch matrix is reported. The footprint of this switch matrix is only 4.6 mm×1.0 mm. Using single-arm modulation, the crosstalk measured in this test is-13 dB --27 dB. And a maximum crosstalk deterioration of 6d B caused by two-path interference is also found.
基金Supported by the National Natural Science Foundation of China(No.61834005,61772417,61802304,61602377,61634004)Shaanxi Provincial Co-ordination Innovation Project of Science and Technology(No.2016KTZDGY02-04-02)+1 种基金Shaanxi Provincial Key R&D Plan(No.2017GY-060)Shaanxi International Science and Technology Cooperation Program(No.2018KW-006).
文摘Electric router is widely used for multi-core system to interconnect each other. However, with the increasing number of processor cores, the probability of communication conflict between processor cores increases, and the data delay increases dramatically. With the advent of optical router, the traditional electrical interconnection mode has changed to optical interconnection mode. In the packet switched optical interconnection network, the data communication mechanism consists of 3 processes: link establishment, data transmission and link termination, but the circuit-switched data transmission method greatly limits the utilization of resources. The number of micro-ring resonators in the on-chip large-scale optical interconnect network is an important parameter affecting the insertion loss. The proposed λ-route, GWOR, Crossbar structure has a large overall network insertion loss due to the use of many micro-ring resonators. How to use the least micro-ring resonator to realize non-blocking communication between multiple cores has been a research hotspot. In order to improve bandwidth and reduce access latency, an optical interconnection structure called multilevel switching optical network on chip(MSONoC) is proposed in this paper. The broadband micro-ring resonators(BMRs) are employed to reduce the number of micro-ring resonators(MRs) in the network, and the structure can provide the service of non-blocking point to point communication with the wavelength division multiplexing(WDM) technology. The results show that compared to λ-route, GWOR, Crossbar and the new topology structure, the number of micro-ring resonators of MSONoC are reduced by 95.5%, 95.5%, 87.5%, and 60% respectively. The insertion loss of the minimum link of new topology, mesh and MSONoC structure is 0.73 dB, 0.725 dB and 0.38 dB.
基金The National High Technology Research and Development Program of China(No.2007AA01Z309)the National Natural Science Foundation of China(No.60803160,No.60873030)
文摘In data streams or web scenarios at highly variable and unpredictable rates, a good join algorithm should be able to "hide" the delays by continuing to output join results. The non-blocking algorithms allow some tuples to be flushed onto disk, with the goal of producing results continuously when data transmission is suspended. But state-of-the-art algorithms have trouble with the constraint of allocated memory. To make better use of memory, a novel non-blocking join algorithm based on hash-merge for improving query response times is proposed. The reduced data structure of in-memory tuples helps to improve memory utility. A replacement selection tree is applied to adjust memory by expanding or shrinking the size of the tree and separates one external join transaction into multi-subtasks. In addition, a cost model to estimate task output rate is proposed to select the in-disk portion that promises to produce the fastest results in the external join stage. Experiments show that the technique, with far less memory, delivers results faster than the three non-blocking join algorithms ( XJoin, HMJ and RPJ ) , with up to almost two-fold improvement in reliable network and one order of magnitude improvement in unreliable network in terms of the number of the reported tuples.
基金Air Force Research Laboratory(AFRL)(FA8650-15-2-5220)Advanced Research Projects Agency-Energy(ARPA-E)(DE-AR00000843)+1 种基金European Commission(EC)(H2020-731954)Rockport Networks Inc
文摘We report on the first monolithically integrated microring-based optical switch in the switch-and-select architecture. The switch fabric delivers strictly non-blocking connectivity while completely canceling the first-order crosstalk. The 4 × 4 switching circuit consists of eight silicon microring-based spatial(de-)multiplexers interconnected by a Si/SiN dual-layer crossing-free central shuffle. Analysis of the on-state and off-state power transfer functions reveals the extinction ratios of individual ring resonators exceeding 25 dB, leading to switch crosstalk suppression of up to over 50 dB in the switch-and-select topology. Optical paths are assessed, showing losses as low as 0.1 dB per off-resonance ring and 0.5 dB per on-resonance ring. Photonic switching is actuated with integrated micro-heaters to give an ~24 GHz passband. The fully packaged device is flip-chip bonded onto a printed circuit board breakout board with a UV-curved fiber array.
基金supported by the National Natural Science Foundation of China (61834005, 61772417, 61874087)the Shaanxi International Science and Technology Cooperation Program (2018KW-006)。
文摘Low-loss, non-blocking, scalable passive optical interconnect network on-chip(LOOKNoC) structure was proposed based on 2×2 optical exchange switches, using wavelength division multiplexing(WDM)technology to expand to 8×8, 16×16, 32×32, 64×64 passive optical interconnection networks, which can achieve non-blocking communication. The experimental results show that based on the 16×16 optical interconnection network structure, the number of microring resonators(MRs) in LOOKNoC was reduced by 90.9%, 90.9%, 20.0% and 75.0% compared with the generic wavelength-routed optical router(GWOR), λ-router, topology and CrossBar structure. By testing the performance parameters based on the structure of 16×16 by the OMNET++ platform, as the result shows, the average insertion loss of LOOKNoC is 3.0%, 11.6%, 4.8% and 16.7% less than that of GWOR, λ-router, Mesh, and CrossBar structures.
基金the National Natural Science Foundation of China (Grant Nos. 60273038 and 90412014)the Program for New Centary Excellent Talents in University of MOE (Grant No. NCET-04-0478)Jiangsu "Six Top Talents" program
文摘Message total ordering is a critical part in active replication in order to maintain consistency among members in a fault tolerant group. The paper proposes a non-blocking message total ordering protocol (NBTOP) for distributed systems. Non-blocking property refers to that the members in a fault tolerant group keep on running independently without waiting for installing the same group view when a fault tolerant group evolves even when decision messages collide. NBTOP takes advantage of token ring as its logical control way. Members adopt re-requesting mechanism (RR) to obtain their lost decisions. Forward acknowledgement mechanism (FA) is put forth to solve decision collisions. The paper further proves that NBTOP satisfies the properties of total order, agreement, and termination. NBTOP is implemented, and its performance test is done. Comparing with the performance of Totem, the results show that NBTOP has a better total ordering delay. It manifests that non-blocking property helps to improve protocol efficiency.
基金Information Industry Bureau of Chongqing(200113010 and 200216006)
文摘A sorting algorithm based on the Batcher' s algorithm is presented. An 8X8multistage interconnection network(MIN) is constructed. Applying wavelength division multiplexing(WDM) technology and integrating control mode, the designed network can realize non-blockingcommunication. The time delay of the MIN and the switches needed are also analyzed in theory, thededuced result conforms that the MIN designed previously is feasible. In the case of the samecommunication quality guaranteed, MIN uses the least switches and completes the communication moreefficiently.
文摘The Godson project is the first attempt to design high performancegeneral-purpose microprocessors in China. This paper introduces the microarchitecture of theGodson-2 processor which is a 64-bit, 4-issue, out-of-order execution RISC processor that implementsthe 64-bit MIPS-like instruction set. The adoption of the aggressive out-of-order executiontechniques (such as register mapping, branch prediction, and dynamic scheduling) and cachetechniques (such as non-blocking cache, load speculation, dynamic memory disambiguation) helps theGodson-2 processor to achieve high performance even at not so high frequency. The Godson-2 processorhas been physically implemented on a 6-metal 0.18 μm CMOS technology based on the automaticplacing and routing flow with the help of some crafted library cells and macros. The area of thechip is 6,700 micrometers by 6,200 micrometers and the clock cycle at typical corner is 2.3 ns.
基金Supported by the National Natural Science Foundation of China for Distinguished Young Scholars under Grant No. 60325205, the National Natural Science Foundation of China under Grant No. 60673146, the National High Technology Development 863 Program of China under Grants No. 2002AAl10010, No. 2005AAl10010, No. 2005AAl19020, and the National Grand Fundamental Research 973 Program of China under Grant No. 2005CB321600.
文摘This paper introduces the microarchitecture and physical implementation of the Godson-2E processor, which is a four-issue superscalar RISC processor that supports the 64-bit MIPS instruction set. The adoption of the aggressive out-of-order execution and memory hierarchy techniques help Godson-2E to achieve high performance. The Godson-2E processor has been physically designed in a 7-metal 90nm CMOS process using the cell-based methodology with some bitsliced manual placement and a number of crafted cells and macros. The processor can be run at 1GHz and achieves a SPEC CPU2000 rate higher than 500.