期刊文献+
共找到1,276篇文章
< 1 2 64 >
每页显示 20 50 100
Design and implementation of dual-mode configurable memory architecture for CNN accelerator
1
作者 山蕊 LI Xiaoshuo +1 位作者 GAO Xu HUO Ziqing 《High Technology Letters》 EI CAS 2024年第2期211-220,共10页
With the rapid development of deep learning algorithms,the computational complexity and functional diversity are increasing rapidly.However,the gap between high computational density and insufficient memory bandwidth ... With the rapid development of deep learning algorithms,the computational complexity and functional diversity are increasing rapidly.However,the gap between high computational density and insufficient memory bandwidth under the traditional von Neumann architecture is getting worse.Analyzing the algorithmic characteristics of convolutional neural network(CNN),it is found that the access characteristics of convolution(CONV)and fully connected(FC)operations are very different.Based on this feature,a dual-mode reronfigurable distributed memory architecture for CNN accelerator is designed.It can be configured in Bank mode or first input first output(FIFO)mode to accommodate the access needs of different operations.At the same time,a programmable memory control unit is designed,which can effectively control the dual-mode configurable distributed memory architecture by using customized special accessing instructions and reduce the data accessing delay.The proposed architecture is verified and tested by parallel implementation of some CNN algorithms.The experimental results show that the peak bandwidth can reach 13.44 GB·s^(-1)at an operating frequency of 120 MHz.This work can achieve 1.40,1.12,2.80 and 4.70 times the peak bandwidth compared with the existing work. 展开更多
关键词 distributed memory structure neural network accelerator reconfigurable arrayprocessor configurable memory structure
下载PDF
面向边缘计算的可重构CNN协处理器研究与设计
2
作者 李伟 陈億 +2 位作者 陈韬 南龙梅 杜怡然 《电子与信息学报》 EI CAS CSCD 北大核心 2024年第4期1499-1512,共14页
随着深度学习技术的发展,卷积神经网络模型的参数量和计算量急剧增加,极大提高了卷积神经网络算法在边缘侧设备的部署成本。因此,为了降低卷积神经网络算法在边缘侧设备上的部署难度,减小推理时延和能耗开销,该文提出一种面向边缘计算... 随着深度学习技术的发展,卷积神经网络模型的参数量和计算量急剧增加,极大提高了卷积神经网络算法在边缘侧设备的部署成本。因此,为了降低卷积神经网络算法在边缘侧设备上的部署难度,减小推理时延和能耗开销,该文提出一种面向边缘计算的可重构CNN协处理器结构。基于按通道处理的数据流模式,提出的两级分布式存储方案解决了片上大规模的数据搬移和重构运算时PE单元间的大量数据移动导致的功耗开销和性能下降的问题;为了避免加速阵列中复杂的数据互联网络传播机制,降低控制的复杂度,该文提出一种灵活的本地访存机制和基于地址转换的填充机制,使得协处理器能够灵活实现任意规格的常规卷积、深度可分离卷积、池化和全连接运算,提升了硬件架构的灵活性。本文提出的协处理器包含256个PE运算单元和176 kB的片上私有存储器,在55 nm TT Corner(25°C,1.2 V)的CMOS工艺下进行逻辑综合和布局布线,最高时钟频率能够达到328 MHz,实现面积为4.41 mm^(2)。在320 MHz的工作频率下,该协处理器峰值运算性能为163.8 GOPs,面积效率为37.14GOPs/mm^(2),完成LeNet-5和MobileNet网络的能效分别为210.7 GOPs/W和340.08 GOPs/W,能够满足边缘智能计算场景下的能效和性能需求。 展开更多
关键词 硬件加速 卷积神经网络 可重构 ASIC
下载PDF
基于注意力机制的CNN-BiLSTM的IGBT剩余使用寿命预测 被引量:2
3
作者 张金萍 薛治伦 +3 位作者 陈航 孙培奇 高策 段宜征 《半导体技术》 CAS 北大核心 2024年第4期373-379,共7页
针对绝缘栅双极型晶体管(IGBT)可靠性问题,提出了一种融合卷积神经网络(CNN)、双向长短期记忆(BiLSTM)网络和注意力机制的剩余使用寿命(RUL)预测模型,可用于IGBT的寿命预测。模型中使用CNN提取特征参数,BiLSTM提取时序信息,注意力机制... 针对绝缘栅双极型晶体管(IGBT)可靠性问题,提出了一种融合卷积神经网络(CNN)、双向长短期记忆(BiLSTM)网络和注意力机制的剩余使用寿命(RUL)预测模型,可用于IGBT的寿命预测。模型中使用CNN提取特征参数,BiLSTM提取时序信息,注意力机制加权处理特征参数。使用IGBT加速老化数据集对提出的模型进行验证。结果表明,对比自回归差分移动平均(ARIMA)、长短期记忆(LSTM)、多层LSTM(Multi-LSTM)、 BiLSTM预测模型,在均方根误差和决定系数等评价指标方面该模型的性能最优。验证了提出的寿命预测模型对IGBT失效预测是有效的。 展开更多
关键词 绝缘栅双极型晶体管(IGBT) 失效预测 加速老化 长短期记忆(LSTM) 注意力机制 卷积神经网络(Cnn)
下载PDF
应用于锂电池SOC估计的PCNN_LSTM硬件加速器设计
4
作者 王巍 夏旭 +2 位作者 丁辉 吴浩 郭家成 《微电子学与计算机》 2024年第10期106-116,共11页
为了克服传统的锂电池状态估计效果差、计算效率低和能效低等问题,提出一种应用于锂电池荷电状态(Stateof Charge,SOC)估计的PCNN_LSTM算法与硬件加速器设计。该算法结合了卷积神经网络和长短期记忆神经网络的特点,可以提取输入数据的... 为了克服传统的锂电池状态估计效果差、计算效率低和能效低等问题,提出一种应用于锂电池荷电状态(Stateof Charge,SOC)估计的PCNN_LSTM算法与硬件加速器设计。该算法结合了卷积神经网络和长短期记忆神经网络的特点,可以提取输入数据的空间特征和时间特征,从而实现更准确的估计效果。为了进一步提高计算效率,设计了基于现场可编程逻辑门阵列(FPGA)的硬件加速器。该加速器利用FPGA的并行计算和片上存储特性,通过并行流水和模块折叠复用的方式来优化卷积运算和矩阵乘法,采用分段线性拟合和移位的方式实现激活函数模块,以及采用分时复用策略实现element_wise模块。在保证精度的同时,有效减少了硬件资源的消耗,提高了整体性能。实验结果表明,在Zynq UltraScale+MPSoC ZCU102 FPGA上实现了一个输入时钟频率为100 MHz的PCNN-LSTM加速器,其峰值吞吐量为75.84GOP/s,能效比为60.915GOP/W。 展开更多
关键词 锂电池 荷电状态 卷积神经网络 长短期记忆神经网络 FPGA 硬件加速
下载PDF
FPGA Optimized Accelerator of DCNN with Fast Data Readout and Multiplier Sharing Strategy 被引量:1
5
作者 Tuo Ma Zhiwei Li +3 位作者 Qingjiang Li Haijun Liu Zhongjin Zhao Yinan Wang 《Computers, Materials & Continua》 SCIE EI 2023年第12期3237-3263,共27页
With the continuous development of deep learning,Deep Convolutional Neural Network(DCNN)has attracted wide attention in the industry due to its high accuracy in image classification.Compared with other DCNN hard-ware ... With the continuous development of deep learning,Deep Convolutional Neural Network(DCNN)has attracted wide attention in the industry due to its high accuracy in image classification.Compared with other DCNN hard-ware deployment platforms,Field Programmable Gate Array(FPGA)has the advantages of being programmable,low power consumption,parallelism,and low cost.However,the enormous amount of calculation of DCNN and the limited logic capacity of FPGA restrict the energy efficiency of the DCNN accelerator.The traditional sequential sliding window method can improve the throughput of the DCNN accelerator by data multiplexing,but this method’s data multiplexing rate is low because it repeatedly reads the data between rows.This paper proposes a fast data readout strategy via the circular sliding window data reading method,it can improve the multiplexing rate of data between rows by optimizing the memory access order of input data.In addition,the multiplication bit width of the DCNN accelerator is much smaller than that of the Digital Signal Processing(DSP)on the FPGA,which means that there will be a waste of resources if a multiplication uses a single DSP.A multiplier sharing strategy is proposed,the multiplier of the accelerator is customized so that a single DSP block can complete multiple groups of 4,6,and 8-bit signed multiplication in parallel.Finally,based on two strategies of appeal,an FPGA optimized accelerator is proposed.The accelerator is customized by Verilog language and deployed on Xilinx VCU118.When the accelerator recognizes the CIRFAR-10 dataset,its energy efficiency is 39.98 GOPS/W,which provides 1.73×speedup energy efficiency over previous DCNN FPGA accelerators.When the accelerator recognizes the IMAGENET dataset,its energy efficiency is 41.12 GOPS/W,which shows 1.28×−3.14×energy efficiency compared with others. 展开更多
关键词 FPGA accelerator DCnn fast data readout strategy multiplier sharing strategy network quantization energy efficient
下载PDF
Reliability analysis of slope stability by neural network,principal component analysis,and transfer learning techniques 被引量:1
6
作者 Sheng Zhang Li Ding +3 位作者 Menglong Xie Xuzhen He Rui Yang Chenxi Tong 《Journal of Rock Mechanics and Geotechnical Engineering》 SCIE CSCD 2024年第10期4034-4045,共12页
The prediction of slope stability is considered as one of the critical concerns in geotechnical engineering.Conventional stochastic analysis with spatially variable slopes is time-consuming and highly computation-dema... The prediction of slope stability is considered as one of the critical concerns in geotechnical engineering.Conventional stochastic analysis with spatially variable slopes is time-consuming and highly computation-demanding.To assess the slope stability problems with a more desirable computational effort,many machine learning(ML)algorithms have been proposed.However,most ML-based techniques require that the training data must be in the same feature space and have the same distribution,and the model may need to be rebuilt when the spatial distribution changes.This paper presents a new ML-based algorithm,which combines the principal component analysis(PCA)-based neural network(NN)and transfer learning(TL)techniques(i.e.PCAeNNeTL)to conduct the stability analysis of slopes with different spatial distributions.The Monte Carlo coupled with finite element simulation is first conducted for data acquisition considering the spatial variability of cohesive strength or friction angle of soils from eight slopes with the same geometry.The PCA method is incorporated into the neural network algorithm(i.e.PCA-NN)to increase the computational efficiency by reducing the input variables.It is found that the PCA-NN algorithm performs well in improving the prediction of slope stability for a given slope in terms of the computational accuracy and computational effort when compared with the other two algorithms(i.e.NN and decision trees,DT).Furthermore,the PCAeNNeTL algorithm shows great potential in assessing the stability of slope even with fewer training data. 展开更多
关键词 Slope stability analysis Monte Carlo simulation Neural network(nn) Transfer learning(TL)
下载PDF
Design space exploration of neural network accelerator based on transfer learning
7
作者 吴豫章 ZHI Tian +1 位作者 SONG Xinkai LI Xi 《High Technology Letters》 EI CAS 2023年第4期416-426,共11页
With the increasing demand of computational power in artificial intelligence(AI)algorithms,dedicated accelerators have become a necessity.However,the complexity of hardware architectures,vast design search space,and c... With the increasing demand of computational power in artificial intelligence(AI)algorithms,dedicated accelerators have become a necessity.However,the complexity of hardware architectures,vast design search space,and complex tasks of accelerators have posed significant challenges.Tra-ditional search methods can become prohibitively slow if the search space continues to be expanded.A design space exploration(DSE)method is proposed based on transfer learning,which reduces the time for repeated training and uses multi-task models for different tasks on the same processor.The proposed method accurately predicts the latency and energy consumption associated with neural net-work accelerator design parameters,enabling faster identification of optimal outcomes compared with traditional methods.And compared with other DSE methods by using multilayer perceptron(MLP),the required training time is shorter.Comparative experiments with other methods demonstrate that the proposed method improves the efficiency of DSE without compromising the accuracy of the re-sults. 展开更多
关键词 design space exploration(DSE) transfer learning neural network accelerator multi-task learning
下载PDF
基于ANN模型的内冷型溶液除湿器性能研究
8
作者 罗伊默 常亚银 李念平 《湖南大学学报(自然科学版)》 EI CAS CSCD 北大核心 2024年第9期198-205,共8页
溶液除湿器因可被低品位热能驱动,且具有除湿效率高等优点而受到广泛关注,但其传质性能的预测准确度还有待提高.本文搭建了单通道内冷型溶液除湿实验平台,研究了不同参数对于除湿过程中传质性能的影响,同时,建立了基于MATLAB平台的人工... 溶液除湿器因可被低品位热能驱动,且具有除湿效率高等优点而受到广泛关注,但其传质性能的预测准确度还有待提高.本文搭建了单通道内冷型溶液除湿实验平台,研究了不同参数对于除湿过程中传质性能的影响,同时,建立了基于MATLAB平台的人工神经网络(ANN)模型用于预测传质性能,并用上述实验数据对该ANN模型进行了验证.结果表明,ANN模型预测得出的Sh与实验Sh平均绝对相对偏差(MARD)为4.07%.与现有经验公式相比,建立的ANN模型预测精度更高.此外,还利用ANN模型研究了不同参数变化下的Sh的变化趋势,从而分析不同参数对除湿性能的影响. 展开更多
关键词 机器学习 神经网络 溶液除湿器 参数化研究
下载PDF
基于可重构阵列的CNN数据量化方法
9
作者 朱家扬 蒋林 +2 位作者 李远成 宋佳 刘帅 《计算机应用研究》 CSCD 北大核心 2024年第4期1070-1076,共7页
针对卷积神经网络(CNN)模型中大量卷积操作,导致网络规模大幅增加,从而无法部署到嵌入式硬件平台,以及不同粒度数据与底层硬件结构不协调导致计算效率低的问题,基于项目组开发的可重构阵列处理器,面向支持多种位宽的运算单元,通过软硬... 针对卷积神经网络(CNN)模型中大量卷积操作,导致网络规模大幅增加,从而无法部署到嵌入式硬件平台,以及不同粒度数据与底层硬件结构不协调导致计算效率低的问题,基于项目组开发的可重构阵列处理器,面向支持多种位宽的运算单元,通过软硬件协同和可重构计算方法,采用KL(Kullback-Leibler)散度自定义量化阈值和随机取整进行截断处理的方式,寻找参数定长的最佳基点位置,设计支持多种计算粒度并行操作的指令及其卷积映射方案,并以此实现三种不同位宽的动态数据量化。实验结果表明,将权值与特征图分别量化到8 bit可以在准确率损失2%的情况下将模型压缩为原来的50%左右;将测试图像量化到三种位宽下进行硬件测试的加速比分别达到1.012、1.273和1.556,最高可缩短35.7%的执行时间和降低56.2%的访存次数,同时仅带来不足1%的相对误差,说明该方法可以在三种量化位宽下实现高效率的神经网络计算,进而达到硬件加速和模型压缩的目的。 展开更多
关键词 卷积神经网络 数据量化 可重构结构 并行映射 加速比
下载PDF
一种基于CNN的混叠光谱解调算法及FPGA实现
10
作者 任嘉楠 焦点 +2 位作者 杨铎 徐春锋 辛璟焘 《半导体光电》 CAS 北大核心 2024年第2期295-302,共8页
为了解决光纤布拉格光栅(FBG)传感网络的光谱信号混叠问题,基于现场可编程门阵列(FPGA)提出了一种利用卷积神经网络(CNN)模型的混叠光谱信号解调算法,并对其进行硬件实现与加速。通过对模型参数进行定点数量化,压缩网络模型的存储空间,... 为了解决光纤布拉格光栅(FBG)传感网络的光谱信号混叠问题,基于现场可编程门阵列(FPGA)提出了一种利用卷积神经网络(CNN)模型的混叠光谱信号解调算法,并对其进行硬件实现与加速。通过对模型参数进行定点数量化,压缩网络模型的存储空间,提高FPGA中DSP资源的利用率;利用循环展开和数组重排等硬件优化方法,提高了系统实时性,确定了算法的并行计算方案。研究结果表明,在100MHz的时钟下,测试集解调精度为1.19pm,推理速度为每帧14.96μs,光谱解调速率为60kHz,对于FBG混叠光谱信号解调具有较高的精度和速率。 展开更多
关键词 光纤光栅 混叠光谱 FPGA 卷积神经网络 硬件加速
下载PDF
Effect of different interventions on orthodontic tooth movement acceleration:A network meta-analysis
11
作者 CHEN Hui-ying ZHANG Li +2 位作者 ZHAN Le WAN Ni MO Li-wen 《Journal of Hainan Medical University》 CAS 2024年第2期41-50,共10页
Objective: To explore the effectiveness of various interventions in accelerating tooth movement, a systematic review and net-work meta analysis were used to draw a conclusion. Methods: MEDLINE, EMBASE, Willey Library,... Objective: To explore the effectiveness of various interventions in accelerating tooth movement, a systematic review and net-work meta analysis were used to draw a conclusion. Methods: MEDLINE, EMBASE, Willey Library, EBSCO, Web of Science Databases, and Cochrane Central Register of Controlled Trials databases to identify relevant studies. ADDIS 1.16.6 and Stata 16.0 software were used for NMA. Results: Five thousand five hundred and forty-two articles were searched out. After screening by two independent investigators, forty-seven randomized controlled trials, 1 390 participants, were included in this network meta-analysis. A total of 11 interventions involving Piezocision (Piezo), Photobiomodulation therapy (PBMT), Plate- let-rich plasma(PRP), Electromagnetic field(EF), Low intensity laser therapy(LLLT), Low intensity pulsed ultrasound(LI-PUS), Low-frequency vibrations(LFV), Distraction osteogenesis(DAD), Corticotomy(Corti), Microosteoperforations (MOPS), Traditional orthodontic(OT)were identified and classified into 3 classes including surgical treatment, non-surgical treatment and traditional orthodontic treatment. According to SUCRA probability ranking of the best intervention effect, when orthodontic treatment lasted for 1 month, PBMT (90.6%), Piezo(87.4%) and MOPs(73.6%)were the top three interventions to improve the efficiency of canine tooth movement. When orthodontic treatment lasted for 2 months, Corti (75.7%), Piezo (69.6%) and LFV(58.9%)were the top three interventions for improving the mobility efficiency of canine tooth movement. When orthodontic treatment lasted for 3 months, Cort (73.3%), LLLT(68.4%)and LFV(60.8%)were the top three interventions for improving the mobility efficiency of canine tooth movement. Conclusion: PBMT and Piezo can improve the efficiency of canine tooth movement significantly after 1 month, while Corti and LFV can improve the efficiency of canine tooth movement better after 2 and 3 months. 展开更多
关键词 Orthodontic tooth movement ACCELERATION network Meta-analysis Randomized controlled trials
下载PDF
DNN在位级可组合架构上的数据流优化方法
12
作者 高汉源 宫磊 王腾 《计算机工程与应用》 CSCD 北大核心 2024年第18期147-157,共11页
位级可组合架构用于支持有多种数据位宽类型的神经网络计算。其硬件结构有较多变体,面对不同神经网络模型需额外设计程序调度。过程耗时,阻碍软硬件的快速迭代和部署,效果难以评估。相关的数据流建模工作缺乏位级计算描述和自动化方法... 位级可组合架构用于支持有多种数据位宽类型的神经网络计算。其硬件结构有较多变体,面对不同神经网络模型需额外设计程序调度。过程耗时,阻碍软硬件的快速迭代和部署,效果难以评估。相关的数据流建模工作缺乏位级计算描述和自动化方法。提出了基于数据流建模的自适应位级可组合架构上的数据调度优化方法解决上述问题。引入位级数据流建模,以多种循环原语和张量-索引关系矩阵,描述位级可组合硬件结构的特征和应用的数据调度过程。从建模表达中提取数据访问信息,统计数据复用情况,进行快速评估。构建了设计空间探索框架,针对不同应用和硬件设计约束自适应优化数据调度过程。利用索引匹配方法和循环变换方法进行设计采样,添加贪心规则进行剪枝,以提高探索效率。在多个应用程序和多种硬件结构约束下进行实验。结果表明对比先进的手动设计的加速器和数据调度,获得了更好的性能表现。 展开更多
关键词 神经网络加速器 可变位宽 数据流 设计空间探索
下载PDF
考虑测量不确定性的ANN-Wiener过程加速退化试验评估
13
作者 李小璐 锁斌 《探测与控制学报》 CSCD 北大核心 2024年第5期87-92,98,共7页
考虑加速退化试验过程中关键性能参数的测量不确定性,将测量不确定性处理为区间数,并建立一种结合人工神经网络与Wiener过程的区间加速退化数据可靠性评估方法。基于ANN-Wiener过程构建加速退化数据的负对数似然函数,采用遗传算法建立... 考虑加速退化试验过程中关键性能参数的测量不确定性,将测量不确定性处理为区间数,并建立一种结合人工神经网络与Wiener过程的区间加速退化数据可靠性评估方法。基于ANN-Wiener过程构建加速退化数据的负对数似然函数,采用遗传算法建立负对数似然函数未知参数的求解方法,最终实现了区间加速退化试验的可靠性评估。通过激光器的加速退化试验,对该方法进行验证,比较该方法与其他方法评估得到的可靠度和真实可靠度的绝对误差,结果表明,基于ANN-Wiener过程对可靠度的评估结果更准确,且考虑到测量不确定性因素的影响,得到激光器可靠度的保守和乐观估计。相同可靠度的情况下,忽略测量不确定性时的评估时间晚于保守估计的评估时间,会导致产品预防维护时机的推迟,增大产品运行过程中的失效风险,增加因产品失效而造成的损失。 展开更多
关键词 加速退化试验 WIENER过程 测量不确定性 人工神经网络 遗传算法
下载PDF
NNL:a domain-specific language for neural networks 被引量:1
14
作者 Wang Bingrui Chen Yunji 《High Technology Letters》 EI CAS 2020年第2期160-167,共8页
Recent years,neural networks(NNs)have received increasing attention from both academia and industry.So far significant diversity among existing NNs as well as their hardware platforms makes NN programming a daunting t... Recent years,neural networks(NNs)have received increasing attention from both academia and industry.So far significant diversity among existing NNs as well as their hardware platforms makes NN programming a daunting task.In this paper,a domain-specific language(DSL)for NNs,neural network language(NNL)is proposed to deliver productivity of NN programming and portable performance of NN execution on different hardware platforms.The productivity and flexibility of NN programming are enabled by abstracting NNs as a directed graph of blocks.The language describes 4 representative and widely used NNs and runs them on 3 different hardware platforms(CPU,GPU and NN accelerator).Experimental results show that NNs written with the proposed language are,on average,14.5%better than the baseline implementations across these 3 platforms.Moreover,compared with the Caffe framework that specifically targets the GPU platform,the code can achieve similar performance. 展开更多
关键词 artificial NEURAL network(nn) domain-specific language(DSL) NEURAL network(nn)accelerator
下载PDF
NN-EdgeBuilder:面向边缘端设备的高性能神经网络推理框架
15
作者 张萌 张雨 +2 位作者 张经纬 曹新野 李鹤 《电子与信息学报》 EI CSCD 北大核心 2023年第9期3132-3140,共9页
飞速发展的神经网络已经在目标检测等领域取得了巨大的成功,通过神经网络推理框架将网络模型高效地自动部署在各类边缘端设备上是目前重要的研究方向。针对以上问题,该文设计一个针对边缘端FPGA的神经网络推理框架NN-EdgeBuilder,能够... 飞速发展的神经网络已经在目标检测等领域取得了巨大的成功,通过神经网络推理框架将网络模型高效地自动部署在各类边缘端设备上是目前重要的研究方向。针对以上问题,该文设计一个针对边缘端FPGA的神经网络推理框架NN-EdgeBuilder,能够利用基于多目标贝叶斯优化的设计空间探索算法充分探索网络每层的并行度因子和量化位宽,接着调用高性能且通用的硬件加速算子来生成低延迟、低功耗的神经网络加速器。该文使用NN-EdgeBuilder在Ultra96-V2 FPGA上部署了UltraNet和VGG网络,生成的UltraNet-P1加速器与最先进的Ul-traNet定制加速器相比,功耗和能效比表现分别提升了17.71%和21.54%。与主流的推理框架相比,NN-Edge-Builder生成的VGG加速器能效比提升了4.40倍,数字信号处理器(DSP)的计算效率提升了50.65%。 展开更多
关键词 神经网络推理框架 设计空间探索 多目标贝叶斯优化 硬件加速算子
下载PDF
A Survey of Accelerator Architectures for Deep Neural Networks 被引量:6
16
作者 Yiran Chen Yuan Xie +2 位作者 Linghao Song Fan Chen Tianqi Tang 《Engineering》 SCIE EI 2020年第3期264-274,共11页
Recently,due to the availability of big data and the rapid growth of computing power,artificial intelligence(AI)has regained tremendous attention and investment.Machine learning(ML)approaches have been successfully ap... Recently,due to the availability of big data and the rapid growth of computing power,artificial intelligence(AI)has regained tremendous attention and investment.Machine learning(ML)approaches have been successfully applied to solve many problems in academia and in industry.Although the explosion of big data applications is driving the development of ML,it also imposes severe challenges of data processing speed and scalability on conventional computer systems.Computing platforms that are dedicatedly designed for AI applications have been considered,ranging from a complement to von Neumann platforms to a“must-have”and stand-alone technical solution.These platforms,which belong to a larger category named“domain-specific computing,”focus on specific customization for AI.In this article,we focus on summarizing the recent advances in accelerator designs for deep neural networks(DNNs)-that is,DNN accelerators.We discuss various architectures that support DNN executions in terms of computing units,dataflow optimization,targeted network topologies,architectures on emerging technologies,and accelerators for emerging applications.We also provide our visions on the future trend of AI chip designs. 展开更多
关键词 Deep neural network Domain-specific architecture accelerator
下载PDF
FPGA implementation of neural network accelerator for pulse information extraction in high energy physics 被引量:2
17
作者 Jun-Ling Chen Peng-Cheng Ai +5 位作者 Dong Wang Hui Wang Ni Fang De-Li Xu Qi Gong Yuan-Kang Yang 《Nuclear Science and Techniques》 SCIE CAS CSCD 2020年第5期27-35,共9页
Extracting the amplitude and time information from the shaped pulse is an important step in nuclear physics experiments.For this purpose,a neural network can be an alternative in off-line data processing.For processin... Extracting the amplitude and time information from the shaped pulse is an important step in nuclear physics experiments.For this purpose,a neural network can be an alternative in off-line data processing.For processing the data in real time and reducing the off-line data storage required in a trigger event,we designed a customized neural network accelerator on a field programmable gate array platform to implement specific layers in a convolutional neural network.The latter is then used in the front-end electronics of the detector.With fully reconfigurable hardware,a tested neural network structure was used for accurate timing of shaped pulses common in front-end electronics.This design can handle up to four channels of pulse signals at once.The peak performance of each channel is 1.665 Giga operations per second at a working frequency of 25 MHz. 展开更多
关键词 Convolutional neural networks PULSE SHAPING ACCELERATION FRONT-END ELECTRONICS
下载PDF
基于CNN和LSTM的航天用涂层型自润滑关节轴承寿命预测及可靠性评估 被引量:2
18
作者 刘云帆 林亮行 +5 位作者 马国政 孙建芳 苏峰华 郭伟玲 朱丽娜 王海斗 《航天器环境工程》 CSCD 北大核心 2023年第5期531-540,共10页
为探索适用于涂层型自润滑关节轴承的寿命预测和可靠性评估方法,提出一种基于卷积神经网络(CNN)和长短期记忆(LSTM)神经网络的轴承剩余寿命预测模型。首先利用CNN对关节轴承的摩擦扭矩信号进行失效特征提取,然后将通过主成分分析(PCA)... 为探索适用于涂层型自润滑关节轴承的寿命预测和可靠性评估方法,提出一种基于卷积神经网络(CNN)和长短期记忆(LSTM)神经网络的轴承剩余寿命预测模型。首先利用CNN对关节轴承的摩擦扭矩信号进行失效特征提取,然后将通过主成分分析(PCA)和滤波处理后的扭矩信号输入LSTM神经网络中进行训练,得到涂层型自润滑关节轴承寿命预测模型,可实现对轴承剩余寿命的准确预测。最后,基于加速寿命试验数据,采用两参数Weibull分布模型对涂层型自润滑关节轴承的服役可靠性进行评估,结果表明涂层型自润滑关节轴承在轻载低频工况下能够维持在高可靠性水平(90%)下进行长时间稳定服役。 展开更多
关键词 涂层型自润滑关节轴承 卷积神经网络 长短期记忆神经网络 加速寿命试验 可靠性评估
下载PDF
SAF-CNN:面向嵌入式FPGA的卷积神经网络稀疏化加速框架 被引量:2
19
作者 谢坤鹏 仪德智 +4 位作者 刘义情 刘航 赫鑫宇 龚成 卢冶 《计算机研究与发展》 EI CSCD 北大核心 2023年第5期1053-1072,共20页
传统的卷积神经网络加速器及推理框架在资源约束的FPGA上部署模型时,往往面临设备种类繁多且资源极端受限、数据带宽利用不充分、算子操作类型复杂难以适配且调度不合理等诸多挑战.提出一种面向嵌入式FPGA的卷积神经网络稀疏化加速框架(... 传统的卷积神经网络加速器及推理框架在资源约束的FPGA上部署模型时,往往面临设备种类繁多且资源极端受限、数据带宽利用不充分、算子操作类型复杂难以适配且调度不合理等诸多挑战.提出一种面向嵌入式FPGA的卷积神经网络稀疏化加速框架(sparse acceleration framework of convolutional neural network, SAF-CNN),通过软硬件协同设计的方法,从硬件加速器与软件推理框架2个角度进行联合优化.首先, SAF-CNN构建并行计算阵列,并且设计并行编解码方案,实现单周期多数据的传输,有效减少通信代价.其次,设计细粒度结构化块划分剪枝算法,于输入通道维度进行块内裁剪来获得稀疏且规则的权重矩阵,借此显著降低计算规模和DSP乘法器等资源占用.然后,提出一种兼容深度可分离卷积的输入通道维度动态拓展及运行时调度策略,实现输入通道参数灵活适配与逐通道卷积和逐点卷积的资源复用.最后,提出一种计算图重构及硬件算子融合优化方法,提升硬件执行效率.实验采用2种资源受限的低端FPGA异构平台Intel CycloneV与Xilinx ZU3EG,结果表明SAF-CNN加速器可分别实现76.3GOPS与494.3GOPS的计算性能.与多核CPU相比,SAF-CNN在进行SSD_MobileNetV1目标模型检测时,可实现3.5倍与2.2倍的性能提升,模型推理速度高达26.5fps. 展开更多
关键词 卷积神经网络 模型压缩 计算图 加速器设计 推理框架
下载PDF
A survey of neural network accelerator with software development environments
20
作者 Jin Song Xuemeng Wang +2 位作者 Zhipeng Zhao Wei Li Tian Zhi 《Journal of Semiconductors》 EI CAS CSCD 2020年第2期20-28,共9页
Recent years,the deep learning algorithm has been widely deployed from cloud servers to terminal units.And researchers proposed various neural network accelerators and software development environments.In this article... Recent years,the deep learning algorithm has been widely deployed from cloud servers to terminal units.And researchers proposed various neural network accelerators and software development environments.In this article,we have reviewed the representative neural network accelerators.As an entirety,the corresponding software stack must consider the hardware architecture of the specific accelerator to enhance the end-to-end performance.And we summarize the programming environments of neural network accelerators and optimizations in software stack.Finally,we comment the future trend of neural network accelerator and programming environments. 展开更多
关键词 neural network accelerator compiling optimization programming environments
下载PDF
上一页 1 2 64 下一页 到第
使用帮助 返回顶部