With the rapid development of machine learning,the demand for high-efficient computing becomes more and more urgent.To break the bottleneck of the traditional Von Neumann architecture,computing-in-memory(CIM)has attra...With the rapid development of machine learning,the demand for high-efficient computing becomes more and more urgent.To break the bottleneck of the traditional Von Neumann architecture,computing-in-memory(CIM)has attracted increasing attention in recent years.In this work,to provide a feasible CIM solution for the large-scale neural networks(NN)requiring continuous weight updating in online training,a flash-based computing-in-memory with high endurance(10^(9) cycles)and ultrafast programming speed is investigated.On the one hand,the proposed programming scheme of channel hot electron injection(CHEI)and hot hole injection(HHI)demonstrate high linearity,symmetric potentiation,and a depression process,which help to improve the training speed and accuracy.On the other hand,the low-damage programming scheme and memory window(MW)optimizations can suppress cell degradation effectively with improved computing accuracy.Even after 109 cycles,the leakage current(I_(off))of cells remains sub-10pA,ensuring the large-scale computing ability of memory.Further characterizations are done on read disturb to demonstrate its robust reliabilities.By processing CIFAR-10 tasks,it is evident that~90%accuracy can be achieved after 109 cycles in both ResNet50 and VGG16 NN.Our results suggest that flash-based CIM has great potential to overcome the limitations of traditional Von Neumann architectures and enable high-performance NN online training,which pave the way for further development of artificial intelligence(AI)accelerators.展开更多
The erase voltage impact on the 0.18μm triple self-aligned split-gate flash endurance is studied.An optimized erase voltage is necessary in order to achieve the best endurance.A lower erase voltage can cause more cel...The erase voltage impact on the 0.18μm triple self-aligned split-gate flash endurance is studied.An optimized erase voltage is necessary in order to achieve the best endurance.A lower erase voltage can cause more cell current degradation by increasing its sensitivity to the floating gate voltage drop,which is induced by tunnel oxide charge trapping during program/erase cycling.A higher erase voltage also aggravates the endurance degradation by introducing select gate oxide charge trapping.A progressive erase voltage method is proposed and demonstrated to better balance the two degradation mechanisms and thus further improve endurance performance.展开更多
The“memory wall”of traditional von Neumann computing systems severely restricts the efficiency of data-intensive task execution,while in-memory computing(IMC)architecture is a promising approach to breaking the bott...The“memory wall”of traditional von Neumann computing systems severely restricts the efficiency of data-intensive task execution,while in-memory computing(IMC)architecture is a promising approach to breaking the bottleneck.Although variations and instability in ultra-scaled memory cells seriously degrade the calculation accuracy in IMC architectures,stochastic computing(SC)can compensate for these shortcomings due to its low sensitivity to cell disturbances.Furthermore,massive parallel computing can be processed to improve the speed and efficiency of the system.In this paper,by designing logic functions in NOR flash arrays,SC in IMC for the image edge detection is realized,demonstrating ultra-low computational complexity and power consumption(25.5 fJ/pixel at 2-bit sequence length).More impressively,the noise immunity is 6 times higher than that of the traditional binary method,showing good tolerances to cell variation and reliability degradation when implementing massive parallel computation in the array.展开更多
The temperature characteristics of the read current of the NOR embedded flash memory with a 1.5T-per-cell structure are theoretically analyzed and experimentally verified.We verify that for a cell programmed with a“1...The temperature characteristics of the read current of the NOR embedded flash memory with a 1.5T-per-cell structure are theoretically analyzed and experimentally verified.We verify that for a cell programmed with a“10”state,the read current is either increasing,decreasing,or invariable with the temperature,essentially depending on the reading overdrive voltage of the selected bitcell,or its programming strength.By precisely controlling the programming strength and thus manipulating its temperature coefficient,we propose a new setting method for the reference cells that programs each of reference cells to a charge state with a temperature coefficient closely tracking tail data cells,thereby solving the current coefficient mismatch and improving the read window.展开更多
基金This work was supported by the National Natural Science Foundation of China(Nos.62034006,92264201,and 91964105)the Natural Science Foundation of Shandong Province(Nos.ZR2020JQ28 and ZR2020KF016)the Program of Qilu Young Scholars of Shandong University.
文摘With the rapid development of machine learning,the demand for high-efficient computing becomes more and more urgent.To break the bottleneck of the traditional Von Neumann architecture,computing-in-memory(CIM)has attracted increasing attention in recent years.In this work,to provide a feasible CIM solution for the large-scale neural networks(NN)requiring continuous weight updating in online training,a flash-based computing-in-memory with high endurance(10^(9) cycles)and ultrafast programming speed is investigated.On the one hand,the proposed programming scheme of channel hot electron injection(CHEI)and hot hole injection(HHI)demonstrate high linearity,symmetric potentiation,and a depression process,which help to improve the training speed and accuracy.On the other hand,the low-damage programming scheme and memory window(MW)optimizations can suppress cell degradation effectively with improved computing accuracy.Even after 109 cycles,the leakage current(I_(off))of cells remains sub-10pA,ensuring the large-scale computing ability of memory.Further characterizations are done on read disturb to demonstrate its robust reliabilities.By processing CIFAR-10 tasks,it is evident that~90%accuracy can be achieved after 109 cycles in both ResNet50 and VGG16 NN.Our results suggest that flash-based CIM has great potential to overcome the limitations of traditional Von Neumann architectures and enable high-performance NN online training,which pave the way for further development of artificial intelligence(AI)accelerators.
文摘The erase voltage impact on the 0.18μm triple self-aligned split-gate flash endurance is studied.An optimized erase voltage is necessary in order to achieve the best endurance.A lower erase voltage can cause more cell current degradation by increasing its sensitivity to the floating gate voltage drop,which is induced by tunnel oxide charge trapping during program/erase cycling.A higher erase voltage also aggravates the endurance degradation by introducing select gate oxide charge trapping.A progressive erase voltage method is proposed and demonstrated to better balance the two degradation mechanisms and thus further improve endurance performance.
基金supported by the National Natural Science Foundation of China(Nos.62034006,91964105,61874068)the China Key Research and Development Program(No.2016YFA0201802)+1 种基金the Natural Science Foundation of Shandong Province(No.ZR2020JQ28)Program of Qilu Young Scholars of Shandong University。
文摘The“memory wall”of traditional von Neumann computing systems severely restricts the efficiency of data-intensive task execution,while in-memory computing(IMC)architecture is a promising approach to breaking the bottleneck.Although variations and instability in ultra-scaled memory cells seriously degrade the calculation accuracy in IMC architectures,stochastic computing(SC)can compensate for these shortcomings due to its low sensitivity to cell disturbances.Furthermore,massive parallel computing can be processed to improve the speed and efficiency of the system.In this paper,by designing logic functions in NOR flash arrays,SC in IMC for the image edge detection is realized,demonstrating ultra-low computational complexity and power consumption(25.5 fJ/pixel at 2-bit sequence length).More impressively,the noise immunity is 6 times higher than that of the traditional binary method,showing good tolerances to cell variation and reliability degradation when implementing massive parallel computation in the array.
文摘The temperature characteristics of the read current of the NOR embedded flash memory with a 1.5T-per-cell structure are theoretically analyzed and experimentally verified.We verify that for a cell programmed with a“10”state,the read current is either increasing,decreasing,or invariable with the temperature,essentially depending on the reading overdrive voltage of the selected bitcell,or its programming strength.By precisely controlling the programming strength and thus manipulating its temperature coefficient,we propose a new setting method for the reference cells that programs each of reference cells to a charge state with a temperature coefficient closely tracking tail data cells,thereby solving the current coefficient mismatch and improving the read window.