在倒装芯片的单粒子效应防护设计验证中,重离子在到达器件敏感区前要经过几百微米的衬底材料,需要计算器件敏感区中离子的线性能量传输(LET)值。采用兰州重离子加速器加速的55 Me V/μ58Ni离子对基于倒装的Xilinx公司550万门现场可编程...在倒装芯片的单粒子效应防护设计验证中,重离子在到达器件敏感区前要经过几百微米的衬底材料,需要计算器件敏感区中离子的线性能量传输(LET)值。采用兰州重离子加速器加速的55 Me V/μ58Ni离子对基于倒装的Xilinx公司550万门现场可编程门阵列(FPGA)实现的典型系统的单粒子效应防护设计进行了试验验证,采用SRIM、FLUKA和GEANT等不同方法对试验中的LET值进行了分析,同时将SRIM分析的典型结果与基于磁偏转飞行时间法的试验数据进行了比较,发现与现有的重离子分析结果有一定差异。因此在防护验证中采用离子LET作为主要参数的情况下,应对重离子(尤其是高能段)的LET的计算方法进行约定,以规范试验过程,增强数据的可比性。展开更多
A differential paired eFuse OTP(one-time programmable)memory cell which can be configured into a 2D(two-dimensional)eFuse cell array was proposed.The sensible resistance of a programmed eFuse link is a half smaller th...A differential paired eFuse OTP(one-time programmable)memory cell which can be configured into a 2D(two-dimensional)eFuse cell array was proposed.The sensible resistance of a programmed eFuse link is a half smaller than that of the single-ended counterpart and BL datum can be sensed without a reference voltage.With this 2D array of differential paired eFuse OTP memory cells,we design a 32-bit eFuse OTP memory IP.We use a sense amplifier based D F/F circuit as the BL(bit-line)SA(sense amplifier)and design a sensing margin test circuit with a variable pull-up load.It is confirmed by the function test that the designed 32-bit OTP memory IP functions normally on 30 sample dies.展开更多
A multicast replication algorithm is proposed for shared memory switches. It uses a dedicated FIFO to multicast by replicating cells at receiver and the FIFO is operating with shared memory in parallel. Speedup is use...A multicast replication algorithm is proposed for shared memory switches. It uses a dedicated FIFO to multicast by replicating cells at receiver and the FIFO is operating with shared memory in parallel. Speedup is used to promote loss and delay performance. A new queueing analytical model is developed based on a sub-timeslot approach. The system performance in terms of cell loss and delay is analyzed and verified by simulation.展开更多
The computational capability of a coarse-grained reconfigurable array(CGRA)can be significantly restrained due to data and context memory bandwidth bottlenecks.Traditionally,two methods have been used to resolve this ...The computational capability of a coarse-grained reconfigurable array(CGRA)can be significantly restrained due to data and context memory bandwidth bottlenecks.Traditionally,two methods have been used to resolve this problem.One method loads the context into the CGRA at run time.This method occupies very small on-chip memory but induces very large latency,which leads to low computational efficiency.The other method adopts a multi-context structure.This method loads the context into the on-chip context memory at the boot phase.Broadcasting the pointer of a set of contexts changes the hardware configuration on a cycle-by-cycle basis.The size of the context memory induces a large area overhead in multi-context structures,which results in major restrictions on application complexity.This paper proposes a Predictable Context Cache(PCC)architecture to address the above context issues by buffering the context inside a CGRA.In this architecture,context is dynamically transferred into the CGRA.Utilizing a PCC significantly reduces the on-chip context memory and the complexity of the applications running on the CGRA is no longer restricted by the size of the on-chip context memory.Data preloading is the most frequently used approach to hide input data latency and speed up the data transmission process for the data bandwidth issue.Rather than fundamentally reducing the amount of input data,the transferred data and computations are processed in parallel.However,the data preloading method cannot work efficiently because data transmission becomes the critical path as the reconfigurable array scale increases.This paper also presents a Hierarchical Data Memory(HDM)architecture as a solution to the efficiency problem.In this architecture,high internal bandwidth is provided to buffer both reused input data and intermediate data.The HDM architecture relieves the external memory from the data transfer burden so that the performance is significantly improved.As a result of using PCC and HDM,experiments running mainstream video decoding programs achieved performance improvements of 13.57%–19.48%when there was a reasonable memory size.Therefore,1080p@35.7fps for H.264high profile video decoding can be achieved on PCC and HDM architecture when utilizing a 200 MHz working frequency.Further,the size of the on-chip context memory no longer restricted complex applications,which were efficiently executed on the PCC and HDM architecture.展开更多
文摘在倒装芯片的单粒子效应防护设计验证中,重离子在到达器件敏感区前要经过几百微米的衬底材料,需要计算器件敏感区中离子的线性能量传输(LET)值。采用兰州重离子加速器加速的55 Me V/μ58Ni离子对基于倒装的Xilinx公司550万门现场可编程门阵列(FPGA)实现的典型系统的单粒子效应防护设计进行了试验验证,采用SRIM、FLUKA和GEANT等不同方法对试验中的LET值进行了分析,同时将SRIM分析的典型结果与基于磁偏转飞行时间法的试验数据进行了比较,发现与现有的重离子分析结果有一定差异。因此在防护验证中采用离子LET作为主要参数的情况下,应对重离子(尤其是高能段)的LET的计算方法进行约定,以规范试验过程,增强数据的可比性。
基金Project supported by the Second Stage of Brain Korea 21 Projectssupported by Industrial Strategic Technology Development Program funded by the Ministry of Knowledge Economy (MKE,Korea)(10039239,"Development of Power Management System SoC Supporting Multi-Battery-Cells and Multi-Energy-Sources for Smart Phones and Smart Devices")
文摘A differential paired eFuse OTP(one-time programmable)memory cell which can be configured into a 2D(two-dimensional)eFuse cell array was proposed.The sensible resistance of a programmed eFuse link is a half smaller than that of the single-ended counterpart and BL datum can be sensed without a reference voltage.With this 2D array of differential paired eFuse OTP memory cells,we design a 32-bit eFuse OTP memory IP.We use a sense amplifier based D F/F circuit as the BL(bit-line)SA(sense amplifier)and design a sensing margin test circuit with a variable pull-up load.It is confirmed by the function test that the designed 32-bit OTP memory IP functions normally on 30 sample dies.
文摘A multicast replication algorithm is proposed for shared memory switches. It uses a dedicated FIFO to multicast by replicating cells at receiver and the FIFO is operating with shared memory in parallel. Speedup is used to promote loss and delay performance. A new queueing analytical model is developed based on a sub-timeslot approach. The system performance in terms of cell loss and delay is analyzed and verified by simulation.
基金supported by the National High Technology Research and Development Program of China(Grant No.2012AA012701)
文摘The computational capability of a coarse-grained reconfigurable array(CGRA)can be significantly restrained due to data and context memory bandwidth bottlenecks.Traditionally,two methods have been used to resolve this problem.One method loads the context into the CGRA at run time.This method occupies very small on-chip memory but induces very large latency,which leads to low computational efficiency.The other method adopts a multi-context structure.This method loads the context into the on-chip context memory at the boot phase.Broadcasting the pointer of a set of contexts changes the hardware configuration on a cycle-by-cycle basis.The size of the context memory induces a large area overhead in multi-context structures,which results in major restrictions on application complexity.This paper proposes a Predictable Context Cache(PCC)architecture to address the above context issues by buffering the context inside a CGRA.In this architecture,context is dynamically transferred into the CGRA.Utilizing a PCC significantly reduces the on-chip context memory and the complexity of the applications running on the CGRA is no longer restricted by the size of the on-chip context memory.Data preloading is the most frequently used approach to hide input data latency and speed up the data transmission process for the data bandwidth issue.Rather than fundamentally reducing the amount of input data,the transferred data and computations are processed in parallel.However,the data preloading method cannot work efficiently because data transmission becomes the critical path as the reconfigurable array scale increases.This paper also presents a Hierarchical Data Memory(HDM)architecture as a solution to the efficiency problem.In this architecture,high internal bandwidth is provided to buffer both reused input data and intermediate data.The HDM architecture relieves the external memory from the data transfer burden so that the performance is significantly improved.As a result of using PCC and HDM,experiments running mainstream video decoding programs achieved performance improvements of 13.57%–19.48%when there was a reasonable memory size.Therefore,1080p@35.7fps for H.264high profile video decoding can be achieved on PCC and HDM architecture when utilizing a 200 MHz working frequency.Further,the size of the on-chip context memory no longer restricted complex applications,which were efficiently executed on the PCC and HDM architecture.