A novel quantum memory scheme is proposed for quantum data buses in scalable quantum computers by using adjustable interaction. Our investigation focuses on a hybrid quantum system including coupled flux qubits and a ...A novel quantum memory scheme is proposed for quantum data buses in scalable quantum computers by using adjustable interaction. Our investigation focuses on a hybrid quantum system including coupled flux qubits and a nitrogen–vacancy center ensemble. In our scheme, the transmission and storage(retrieval) of quantum state are performed in two separated steps, which can be controlled by adjusting the coupling strength between the computing unit and the quantum memory. The scheme can be used not only to reduce the time of quantum state transmission, but also to increase the robustness of the system with respect to detuning caused by magnetic noises. In comparison with the previous memory scheme, about 80% of the transmission time is saved. Moreover, it is exemplified that in our scheme the fidelity could achieve 0.99 even when there exists detuning, while the one in the previous scheme is 0.75.展开更多
The computational capability of a coarse-grained reconfigurable array(CGRA)can be significantly restrained due to data and context memory bandwidth bottlenecks.Traditionally,two methods have been used to resolve this ...The computational capability of a coarse-grained reconfigurable array(CGRA)can be significantly restrained due to data and context memory bandwidth bottlenecks.Traditionally,two methods have been used to resolve this problem.One method loads the context into the CGRA at run time.This method occupies very small on-chip memory but induces very large latency,which leads to low computational efficiency.The other method adopts a multi-context structure.This method loads the context into the on-chip context memory at the boot phase.Broadcasting the pointer of a set of contexts changes the hardware configuration on a cycle-by-cycle basis.The size of the context memory induces a large area overhead in multi-context structures,which results in major restrictions on application complexity.This paper proposes a Predictable Context Cache(PCC)architecture to address the above context issues by buffering the context inside a CGRA.In this architecture,context is dynamically transferred into the CGRA.Utilizing a PCC significantly reduces the on-chip context memory and the complexity of the applications running on the CGRA is no longer restricted by the size of the on-chip context memory.Data preloading is the most frequently used approach to hide input data latency and speed up the data transmission process for the data bandwidth issue.Rather than fundamentally reducing the amount of input data,the transferred data and computations are processed in parallel.However,the data preloading method cannot work efficiently because data transmission becomes the critical path as the reconfigurable array scale increases.This paper also presents a Hierarchical Data Memory(HDM)architecture as a solution to the efficiency problem.In this architecture,high internal bandwidth is provided to buffer both reused input data and intermediate data.The HDM architecture relieves the external memory from the data transfer burden so that the performance is significantly improved.As a result of using PCC and HDM,experiments running mainstream video decoding programs achieved performance improvements of 13.57%–19.48%when there was a reasonable memory size.Therefore,1080p@35.7fps for H.264high profile video decoding can be achieved on PCC and HDM architecture when utilizing a 200 MHz working frequency.Further,the size of the on-chip context memory no longer restricted complex applications,which were efficiently executed on the PCC and HDM architecture.展开更多
---Double data rate synchronous dynamic random access memory (DDR3) has become one of the most mainstream applications in current server and computer systems. In order to quickly set up a system-level signal integri...---Double data rate synchronous dynamic random access memory (DDR3) has become one of the most mainstream applications in current server and computer systems. In order to quickly set up a system-level signal integrity (SI) simulation flow for the DDR3 interface, two system-level SI simulation methodologies, which are board-level S-parameter extraction in the frequency-domain and system-level simulation assumptions in the time domain, are introduced in this paper. By comparing the flow of Speed2000 and PowerSI/Hspice, PowerSI is chosen for the printed circuit board (PCB) board-level S-parameter extraction, while Tektronix oscilloscope (TDS7404) is used for the DDR3 waveform measurement. The lab measurement shows good agreement between simulation and measurement. The study shows that the combination of PowerSI and Hspice is recommended for quick system-level DDR3 SI simulation.展开更多
Four different states of Si15Sb85 and Ge2Sb2Te5 phase change memory thin films are obtained by crystallization degree modulation through laser initialization at different powers or annealing at different temperatures....Four different states of Si15Sb85 and Ge2Sb2Te5 phase change memory thin films are obtained by crystallization degree modulation through laser initialization at different powers or annealing at different temperatures. The polarization characteristics of these two four-level phase change recording media are analyzed systematically. A simple and effective readout scheme is then proposed, and the readout signal is numerically simulated. The results show that a high-contrast polarization readout can be obtained in an extensive wavelength range for the four-level phase change recording media using common phase change materials. This study will help in-depth understanding of the physical mechanisms and provide technical approaches to multilevel phase change recording.展开更多
In some military application scenarios,Unmanned Aerial Vehicles(UAVs)need to perform missions with the assistance of on-board cameras when radar is not available and communication is interrupted,which brings challenge...In some military application scenarios,Unmanned Aerial Vehicles(UAVs)need to perform missions with the assistance of on-board cameras when radar is not available and communication is interrupted,which brings challenges for UAV autonomous navigation and collision avoidance.In this paper,an improved deep-reinforcement-learning algorithm,Deep Q-Network with a Faster R-CNN model and a Data Deposit Mechanism(FRDDM-DQN),is proposed.A Faster R-CNN model(FR)is introduced and optimized to obtain the ability to extract obstacle information from images,and a new replay memory Data Deposit Mechanism(DDM)is designed to train an agent with a better performance.During training,a two-part training approach is used to reduce the time spent on training as well as retraining when the scenario changes.In order to verify the performance of the proposed method,a series of experiments,including training experiments,test experiments,and typical episodes experiments,is conducted in a 3D simulation environment.Experimental results show that the agent trained by the proposed FRDDM-DQN has the ability to navigate autonomously and avoid collisions,and performs better compared to the FRDQN,FR-DDQN,FR-Dueling DQN,YOLO-based YDDM-DQN,and original FR outputbased FR-ODQN.展开更多
A partition checkpoint strategy based on data segment priority is presented to meet the timing constraints of the data and the transaction in embedded real-time main memory database systems(ERTMMDBS) as well as to r...A partition checkpoint strategy based on data segment priority is presented to meet the timing constraints of the data and the transaction in embedded real-time main memory database systems(ERTMMDBS) as well as to reduce the number of the transactions missing their deadlines and the recovery time.The partition checkpoint strategy takes into account the characteristics of the data and the transactions associated with it;moreover,it partitions the database according to the data segment priority and sets the corresponding checkpoint frequency to each partition for independent checkpoint operation.The simulation results show that the partition checkpoint strategy decreases the ratio of trans-actions missing their deadlines.展开更多
An on-chip debug circuit based on Joint Test Action Group(JTAG)interface for L-digital signal processor(L-DSP)is proposed,which has debug functions such as storage resource access,central processing unit(CPU)pipeline ...An on-chip debug circuit based on Joint Test Action Group(JTAG)interface for L-digital signal processor(L-DSP)is proposed,which has debug functions such as storage resource access,central processing unit(CPU)pipeline control,hardware breakpoint/observation point,and parameter statistics.Compared with traditional debug mode,the proposed debug circuit completes direct transmission of data between peripherals and memory by adding data test-direct memory access(DT-DMA)module,which improves debug efficiency greatly.The proposed circuit was designed in a 0.18μm complementary metal-oxide-semiconductor(CMOS)process with an area of 167234.76μm~2 and a power consumption of 8.89 mW.And the proposed debug circuit and L-DSP were verified under a field programmable gate array(FPGA).Experimental results show that the proposed circuit has complete debug functions and the rate of DT-DMA for transferring debug data is three times faster than the CPU.展开更多
基金Project supported by the National Natural Science Foundation of China(Grant Nos.61673389,61273202,61134008,and 11404113)
文摘A novel quantum memory scheme is proposed for quantum data buses in scalable quantum computers by using adjustable interaction. Our investigation focuses on a hybrid quantum system including coupled flux qubits and a nitrogen–vacancy center ensemble. In our scheme, the transmission and storage(retrieval) of quantum state are performed in two separated steps, which can be controlled by adjusting the coupling strength between the computing unit and the quantum memory. The scheme can be used not only to reduce the time of quantum state transmission, but also to increase the robustness of the system with respect to detuning caused by magnetic noises. In comparison with the previous memory scheme, about 80% of the transmission time is saved. Moreover, it is exemplified that in our scheme the fidelity could achieve 0.99 even when there exists detuning, while the one in the previous scheme is 0.75.
基金supported by the National High Technology Research and Development Program of China(Grant No.2012AA012701)
文摘The computational capability of a coarse-grained reconfigurable array(CGRA)can be significantly restrained due to data and context memory bandwidth bottlenecks.Traditionally,two methods have been used to resolve this problem.One method loads the context into the CGRA at run time.This method occupies very small on-chip memory but induces very large latency,which leads to low computational efficiency.The other method adopts a multi-context structure.This method loads the context into the on-chip context memory at the boot phase.Broadcasting the pointer of a set of contexts changes the hardware configuration on a cycle-by-cycle basis.The size of the context memory induces a large area overhead in multi-context structures,which results in major restrictions on application complexity.This paper proposes a Predictable Context Cache(PCC)architecture to address the above context issues by buffering the context inside a CGRA.In this architecture,context is dynamically transferred into the CGRA.Utilizing a PCC significantly reduces the on-chip context memory and the complexity of the applications running on the CGRA is no longer restricted by the size of the on-chip context memory.Data preloading is the most frequently used approach to hide input data latency and speed up the data transmission process for the data bandwidth issue.Rather than fundamentally reducing the amount of input data,the transferred data and computations are processed in parallel.However,the data preloading method cannot work efficiently because data transmission becomes the critical path as the reconfigurable array scale increases.This paper also presents a Hierarchical Data Memory(HDM)architecture as a solution to the efficiency problem.In this architecture,high internal bandwidth is provided to buffer both reused input data and intermediate data.The HDM architecture relieves the external memory from the data transfer burden so that the performance is significantly improved.As a result of using PCC and HDM,experiments running mainstream video decoding programs achieved performance improvements of 13.57%–19.48%when there was a reasonable memory size.Therefore,1080p@35.7fps for H.264high profile video decoding can be achieved on PCC and HDM architecture when utilizing a 200 MHz working frequency.Further,the size of the on-chip context memory no longer restricted complex applications,which were efficiently executed on the PCC and HDM architecture.
基金supported by the National Natural Science Foundation of China under Grant No.61161001
文摘---Double data rate synchronous dynamic random access memory (DDR3) has become one of the most mainstream applications in current server and computer systems. In order to quickly set up a system-level signal integrity (SI) simulation flow for the DDR3 interface, two system-level SI simulation methodologies, which are board-level S-parameter extraction in the frequency-domain and system-level simulation assumptions in the time domain, are introduced in this paper. By comparing the flow of Speed2000 and PowerSI/Hspice, PowerSI is chosen for the printed circuit board (PCB) board-level S-parameter extraction, while Tektronix oscilloscope (TDS7404) is used for the DDR3 waveform measurement. The lab measurement shows good agreement between simulation and measurement. The study shows that the combination of PowerSI and Hspice is recommended for quick system-level DDR3 SI simulation.
基金Project supported by the National Natural Science Foundation of China(Grant Nos.61178059 and 61137002)the Key Program of the Science and Technology Commission of Shanghai Municipality,China(Grant No.11jc1413300)
文摘Four different states of Si15Sb85 and Ge2Sb2Te5 phase change memory thin films are obtained by crystallization degree modulation through laser initialization at different powers or annealing at different temperatures. The polarization characteristics of these two four-level phase change recording media are analyzed systematically. A simple and effective readout scheme is then proposed, and the readout signal is numerically simulated. The results show that a high-contrast polarization readout can be obtained in an extensive wavelength range for the four-level phase change recording media using common phase change materials. This study will help in-depth understanding of the physical mechanisms and provide technical approaches to multilevel phase change recording.
文摘In some military application scenarios,Unmanned Aerial Vehicles(UAVs)need to perform missions with the assistance of on-board cameras when radar is not available and communication is interrupted,which brings challenges for UAV autonomous navigation and collision avoidance.In this paper,an improved deep-reinforcement-learning algorithm,Deep Q-Network with a Faster R-CNN model and a Data Deposit Mechanism(FRDDM-DQN),is proposed.A Faster R-CNN model(FR)is introduced and optimized to obtain the ability to extract obstacle information from images,and a new replay memory Data Deposit Mechanism(DDM)is designed to train an agent with a better performance.During training,a two-part training approach is used to reduce the time spent on training as well as retraining when the scenario changes.In order to verify the performance of the proposed method,a series of experiments,including training experiments,test experiments,and typical episodes experiments,is conducted in a 3D simulation environment.Experimental results show that the agent trained by the proposed FRDDM-DQN has the ability to navigate autonomously and avoid collisions,and performs better compared to the FRDQN,FR-DDQN,FR-Dueling DQN,YOLO-based YDDM-DQN,and original FR outputbased FR-ODQN.
基金Supported by the National Natural Science Foundation of China (60673128)
文摘A partition checkpoint strategy based on data segment priority is presented to meet the timing constraints of the data and the transaction in embedded real-time main memory database systems(ERTMMDBS) as well as to reduce the number of the transactions missing their deadlines and the recovery time.The partition checkpoint strategy takes into account the characteristics of the data and the transactions associated with it;moreover,it partitions the database according to the data segment priority and sets the corresponding checkpoint frequency to each partition for independent checkpoint operation.The simulation results show that the partition checkpoint strategy decreases the ratio of trans-actions missing their deadlines.
基金supported by the China-Montenegro 3rd Science&Technology Exchange and Cooperation Project(3-7)the Open Research Fund of Hunan Provincial Key Laboratory of Flexible Electronic Materials Genome Engineering(202005)the Double First-Class Scientific Research International Cooperation Expansion Project of Changsha University of Science&Technology(2019ic18)。
文摘An on-chip debug circuit based on Joint Test Action Group(JTAG)interface for L-digital signal processor(L-DSP)is proposed,which has debug functions such as storage resource access,central processing unit(CPU)pipeline control,hardware breakpoint/observation point,and parameter statistics.Compared with traditional debug mode,the proposed debug circuit completes direct transmission of data between peripherals and memory by adding data test-direct memory access(DT-DMA)module,which improves debug efficiency greatly.The proposed circuit was designed in a 0.18μm complementary metal-oxide-semiconductor(CMOS)process with an area of 167234.76μm~2 and a power consumption of 8.89 mW.And the proposed debug circuit and L-DSP were verified under a field programmable gate array(FPGA).Experimental results show that the proposed circuit has complete debug functions and the rate of DT-DMA for transferring debug data is three times faster than the CPU.