期刊文献+
共找到18篇文章
< 1 >
每页显示 20 50 100
Design space exploration of neural network accelerator based on transfer learning
1
作者 吴豫章 ZHI Tian +1 位作者 SONG Xinkai LI Xi 《High Technology Letters》 EI CAS 2023年第4期416-426,共11页
With the increasing demand of computational power in artificial intelligence(AI)algorithms,dedicated accelerators have become a necessity.However,the complexity of hardware architectures,vast design search space,and c... With the increasing demand of computational power in artificial intelligence(AI)algorithms,dedicated accelerators have become a necessity.However,the complexity of hardware architectures,vast design search space,and complex tasks of accelerators have posed significant challenges.Tra-ditional search methods can become prohibitively slow if the search space continues to be expanded.A design space exploration(DSE)method is proposed based on transfer learning,which reduces the time for repeated training and uses multi-task models for different tasks on the same processor.The proposed method accurately predicts the latency and energy consumption associated with neural net-work accelerator design parameters,enabling faster identification of optimal outcomes compared with traditional methods.And compared with other DSE methods by using multilayer perceptron(MLP),the required training time is shorter.Comparative experiments with other methods demonstrate that the proposed method improves the efficiency of DSE without compromising the accuracy of the re-sults. 展开更多
关键词 design space exploration(DSE) transfer learning neural network accelerator multi-task learning
下载PDF
Design and implementation of dual-mode configurable memory architecture for CNN accelerator
2
作者 山蕊 LI Xiaoshuo +1 位作者 GAO Xu HUO Ziqing 《High Technology Letters》 EI CAS 2024年第2期211-220,共10页
With the rapid development of deep learning algorithms,the computational complexity and functional diversity are increasing rapidly.However,the gap between high computational density and insufficient memory bandwidth ... With the rapid development of deep learning algorithms,the computational complexity and functional diversity are increasing rapidly.However,the gap between high computational density and insufficient memory bandwidth under the traditional von Neumann architecture is getting worse.Analyzing the algorithmic characteristics of convolutional neural network(CNN),it is found that the access characteristics of convolution(CONV)and fully connected(FC)operations are very different.Based on this feature,a dual-mode reronfigurable distributed memory architecture for CNN accelerator is designed.It can be configured in Bank mode or first input first output(FIFO)mode to accommodate the access needs of different operations.At the same time,a programmable memory control unit is designed,which can effectively control the dual-mode configurable distributed memory architecture by using customized special accessing instructions and reduce the data accessing delay.The proposed architecture is verified and tested by parallel implementation of some CNN algorithms.The experimental results show that the peak bandwidth can reach 13.44 GB·s^(-1)at an operating frequency of 120 MHz.This work can achieve 1.40,1.12,2.80 and 4.70 times the peak bandwidth compared with the existing work. 展开更多
关键词 distributed memory structure neural network accelerator reconfigurable arrayprocessor configurable memory structure
下载PDF
网络“加速主义”思潮动向研究
3
作者 刘明皞 《西部学刊》 2024年第1期25-28,共4页
西方自由民主制度失灵,持续将“加速主义”思潮由现实领域挤压至网络空间。网络“加速主义”拥趸者通过宣扬“只破不立”的观点,使个体摒弃理性思考,进而鼓动群体否定原有政治体制,以极端言行制造社会裂隙,加速政权瓦解。面对网络“加... 西方自由民主制度失灵,持续将“加速主义”思潮由现实领域挤压至网络空间。网络“加速主义”拥趸者通过宣扬“只破不立”的观点,使个体摒弃理性思考,进而鼓动群体否定原有政治体制,以极端言行制造社会裂隙,加速政权瓦解。面对网络“加速主义”思潮煽动,要系统学习习近平总书记关于网络强国的重要思想,加强网络法治管理及意识形态教育,完善舆情跟踪研判机制,拓宽表达渠道,消解网络“加速主义”的负面影响,营造清朗网络空间,建设良好网络环境。 展开更多
关键词 “加速主义”思潮 网络治理 网络强国
下载PDF
A Method for Reducing Noise Radiated from Structures with Vibration Absorbers by Using an Accelerated Neural Network 被引量:2
4
作者 李连进 葛为民 《Transactions of Tianjin University》 EI CAS 2004年第1期9-15,共7页
A method for reducing noise radiated from structures by vibration absorbers is presented. Since usual design method for the absorbers is invalid for noise reduction, the peaks of noise power in the frequency domain as... A method for reducing noise radiated from structures by vibration absorbers is presented. Since usual design method for the absorbers is invalid for noise reduction, the peaks of noise power in the frequency domain as cost functions are applied. Hence, the equations for obtaining optimal parameters of the absorbers become nonlinear expressions. To have the parameters, an accelerated neural network procedure has been presented. Numerical calculations have been carried out for a plate type cantilever beam with a large width, and experimental tests have been also performed for the same beam. It is clarified that the present method is valid for reducing noise radiated from structures. As for the usual design method for the absorbers, model analysis has been given, so the number of absorbers should be the same as that of the considered modes. While the nonlinear problem can be dealt with by the present method, there is no restriction on the number of absorbers or the model number. 展开更多
关键词 STRUCTURE vibration and noise control vibration absorber neural network accelerated neural network
下载PDF
NNL:a domain-specific language for neural networks 被引量:1
5
作者 Wang Bingrui Chen Yunji 《High Technology Letters》 EI CAS 2020年第2期160-167,共8页
Recent years,neural networks(NNs)have received increasing attention from both academia and industry.So far significant diversity among existing NNs as well as their hardware platforms makes NN programming a daunting t... Recent years,neural networks(NNs)have received increasing attention from both academia and industry.So far significant diversity among existing NNs as well as their hardware platforms makes NN programming a daunting task.In this paper,a domain-specific language(DSL)for NNs,neural network language(NNL)is proposed to deliver productivity of NN programming and portable performance of NN execution on different hardware platforms.The productivity and flexibility of NN programming are enabled by abstracting NNs as a directed graph of blocks.The language describes 4 representative and widely used NNs and runs them on 3 different hardware platforms(CPU,GPU and NN accelerator).Experimental results show that NNs written with the proposed language are,on average,14.5%better than the baseline implementations across these 3 platforms.Moreover,compared with the Caffe framework that specifically targets the GPU platform,the code can achieve similar performance. 展开更多
关键词 artificial NEURAL network(NN) domain-specific language(DSL) NEURAL network(NN)accelerator
下载PDF
A survey of neural network accelerator with software development environments
6
作者 Jin Song Xuemeng Wang +2 位作者 Zhipeng Zhao Wei Li Tian Zhi 《Journal of Semiconductors》 EI CAS CSCD 2020年第2期20-28,共9页
Recent years,the deep learning algorithm has been widely deployed from cloud servers to terminal units.And researchers proposed various neural network accelerators and software development environments.In this article... Recent years,the deep learning algorithm has been widely deployed from cloud servers to terminal units.And researchers proposed various neural network accelerators and software development environments.In this article,we have reviewed the representative neural network accelerators.As an entirety,the corresponding software stack must consider the hardware architecture of the specific accelerator to enhance the end-to-end performance.And we summarize the programming environments of neural network accelerators and optimizations in software stack.Finally,we comment the future trend of neural network accelerator and programming environments. 展开更多
关键词 neural network accelerator compiling optimization programming environments
下载PDF
Optimizing deep learning inference on mobile devices with neural network accelerators
7
作者 Zeng Xi Xu Yunlong Zhi Tian 《High Technology Letters》 EI CAS 2019年第4期417-425,共9页
Deep learning has now been widely used in intelligent apps of mobile devices.In pursuit of ultra-low power and latency,integrating neural network accelerators(NNA)to mobile phones has become a trend.However,convention... Deep learning has now been widely used in intelligent apps of mobile devices.In pursuit of ultra-low power and latency,integrating neural network accelerators(NNA)to mobile phones has become a trend.However,conventional deep learning programming frameworks are not well-developed to support such devices,leading to low computing efficiency and high memory-occupation.To address this problem,a 2-stage pipeline is proposed for optimizing deep learning model inference on mobile devices with NNAs in terms of both speed and memory-footprint.The 1 st stage reduces computation workload via graph optimization,including splitting and merging nodes.The 2 nd stage goes further by optimizing at compilation level,including kernel fusion and in-advance compilation.The proposed optimizations on a commercial mobile phone with an NNA is evaluated.The experimental results show that the proposed approaches achieve 2.8×to 26×speed up,and reduce the memory-footprint by up to 75%. 展开更多
关键词 machine learning inference neural network accelerator(NNA) low latency kernel fusion in-advance compilation
下载PDF
Networking of BeiDou Navigation Satellite System Accelerated with Two New Members
8
《Aerospace China》 2012年第2期21-21,共1页
At 4:50 on April 30, China's LM-3B/I rocket, an improved type based on LM-3B, made its debut at the Xichang Satellite Launch Center and successfully sending the 12th and 13th BeiDou Navigation Satellite System sat... At 4:50 on April 30, China's LM-3B/I rocket, an improved type based on LM-3B, made its debut at the Xichang Satellite Launch Center and successfully sending the 12th and 13th BeiDou Navigation Satellite System satellites into the planned transfer orbit in space. It was the first time that China launched two BeiDou satellites with one rocket. It was 展开更多
关键词 networking of BeiDou Navigation Satellite System Accelerated with Two New Members
下载PDF
Design and Implementation of A Dynamic Content Cache Module for Web Server 被引量:1
9
作者 LIUDan GUOCheng-cheng ZHANGLi 《Wuhan University Journal of Natural Sciences》 EI CAS 2004年第5期828-834,共7页
Web offers a very convenient way to access remote information resources, an important measurement of evaluating Web services quality is how long it takes to search and get information. By caching the Web server’s dyn... Web offers a very convenient way to access remote information resources, an important measurement of evaluating Web services quality is how long it takes to search and get information. By caching the Web server’s dynamic content, it can avoid repeated queries for database and reduce the access frequency of original resources, thus to improve the speed of server’s response. This paper describes the concept, advantages, principles and concrete realization procedure of a dynamic content cache module for Web server. Key words dynamic content caching - network acceleration - apache module CLC number TP 393.09 Foundation item: Supported by the Science Committee of WuhanBiography: LIU Dan (1980-), male, Master candidate, research direction: high speed computer network, high performance server clusters system. 展开更多
关键词 dynamic content caching network acceleration apache module
下载PDF
Warehouse Environment Parameter Monitoring System and Sensor Error Correction Model Based on PSO-BP 被引量:5
10
作者 Lin Sen Wang Guanglong +3 位作者 Chen Yingjie Wang Le Qiao Zhongtao Gao Fengqi 《Transactions of Nanjing University of Aeronautics and Astronautics》 EI CSCD 2017年第3期333-340,共8页
The warehouse environment parameter monitoring system is designed to avoid the networking and high cost of traditional monitoring system.A sensor error correction model which combines particle swarm optimization(PSO)w... The warehouse environment parameter monitoring system is designed to avoid the networking and high cost of traditional monitoring system.A sensor error correction model which combines particle swarm optimization(PSO)with back propagation(BP)neural network algorithm is established to reduce nonlinear characteristics and improve test accuracy of the system.Simulation and experiments indicate that the PSO-BP neural network algorithm has advantages of fast convergence rate and high diagnostic accuracy.The monitoring system can provide higher measurement precision,lower power consume,stable network data communication and fault diagnoses function.The system has been applied to monitoring environment parameter of warehouse,special vehicles and ships,etc. 展开更多
关键词 Warehouse warehouse correction networking swarm terminals hidden acceleration normalized intelligent
下载PDF
GShuttle:Optimizing Memory Access Efficiency for Graph Convolu-tional Neural Network Accelerators 被引量:1
11
作者 李家军 王可 +1 位作者 郑皓 Ahmed Louri 《Journal of Computer Science & Technology》 SCIE EI CSCD 2023年第1期115-127,共13页
Graph convolutional neural networks(GCNs)have emerged as an effective approach to extending deep learning for graph data analytics,but they are computationally challenging given the irregular graphs and the large num-... Graph convolutional neural networks(GCNs)have emerged as an effective approach to extending deep learning for graph data analytics,but they are computationally challenging given the irregular graphs and the large num-ber of nodes in a graph.GCNs involve chain sparse-dense matrix multiplications with six loops,which results in a large de-sign space for GCN accelerators.Prior work on GCN acceleration either employs limited loop optimization techniques,or determines the design variables based on random sampling,which can hardly exploit data reuse efficiently,thus degrading system efficiency.To overcome this limitation,this paper proposes GShuttle,a GCN acceleration scheme that maximizes memory access efficiency to achieve high performance and energy efficiency.GShuttle systematically explores loop opti-mization techniques for GCN acceleration,and quantitatively analyzes the design objectives(e.g.,required DRAM access-es and SRAM accesses)by analytical calculation based on multiple design variables.GShuttle further employs two ap-proaches,pruned search space sweeping and greedy search,to find the optimal design variables under certain design con-straints.We demonstrated the efficacy of GShuttle by evaluation on five widely used graph datasets.The experimental simulations show that GShuttle reduces the number of DRAM accesses by a factor of 1.5 and saves energy by a factor of 1.7 compared with the state-of-the-art approaches. 展开更多
关键词 graph convolutional neural network memory access neural network accelerator
原文传递
DyPipe: A Holistic Approach to Accelerating Dynamic Neural Networks with Dynamic Pipelining
12
作者 庄毅敏 胡杏 +1 位作者 陈小兵 支天 《Journal of Computer Science & Technology》 SCIE EI CSCD 2023年第4期899-910,共12页
Dynamic neural network(NN)techniques are increasingly important because they facilitate deep learning techniques with more complex network architectures.However,existing studies,which predominantly optimize the static... Dynamic neural network(NN)techniques are increasingly important because they facilitate deep learning techniques with more complex network architectures.However,existing studies,which predominantly optimize the static computational graphs by static scheduling methods,usually focus on optimizing static neural networks in deep neural network(DNN)accelerators.We analyze the execution process of dynamic neural networks and observe that dynamic features introduce challenges for efficient scheduling and pipelining in existing DNN accelerators.We propose DyPipe,a holistic approach to optimizing dynamic neural network inferences in enhanced DNN accelerators.DyPipe achieves significant performance improvements for dynamic neural networks while it introduces negligible overhead for static neural networks.Our evaluation demonstrates that DyPipe achieves 1.7x speedup on dynamic neural networks and maintains more than 96%performance for static neural networks. 展开更多
关键词 dynamic neural network(NN) deep neural network(DNN)accelerator dynamic pipelining
原文传递
Recent advances in efficient computation of deep convolutional neural networks 被引量:36
13
作者 Jian CHENG Pei-song WANG +2 位作者 Gang LI Qing-hao HU Han-qing LU 《Frontiers of Information Technology & Electronic Engineering》 SCIE EI CSCD 2018年第1期64-77,共14页
Deep neural networks have evolved remarkably over the past few years and they are currently the fundamental tools of many intelligent systems.At the same time,the computational complexity and resource consumption of t... Deep neural networks have evolved remarkably over the past few years and they are currently the fundamental tools of many intelligent systems.At the same time,the computational complexity and resource consumption of these networks continue to increase.This poses a significant challenge to the deployment of such networks,especially in real-time applications or on resource-limited devices.Thus,network acceleration has become a hot topic within the deep learning community.As for hardware implementation of deep neural networks,a batch of accelerators based on a field-programmable gate array(FPGA) or an application-specific integrated circuit(ASIC)have been proposed in recent years.In this paper,we provide a comprehensive survey of recent advances in network acceleration,compression,and accelerator design from both algorithm and hardware points of view.Specifically,we provide a thorough analysis of each of the following topics:network pruning,low-rank approximation,network quantization,teacher–student networks,compact network design,and hardware accelerators.Finally,we introduce and discuss a few possible future directions. 展开更多
关键词 Deep neural networks acceleration Compression Hardware accelerator
原文传递
Design and Tool Flow of a Reconfigurable Asynchronous Neural Network Accelerator 被引量:3
14
作者 Jilin Zhang Hui Wu +2 位作者 Weijia Chen Shaojun Wei Hong Chen 《Tsinghua Science and Technology》 SCIE EI CAS CSCD 2021年第5期565-573,共9页
Convolutional Neural Networks(CNNs)are widely used in computer vision,natural language processing,and so on,which generally require low power and high efficiency in real applications.Thus,energy efficiency has become ... Convolutional Neural Networks(CNNs)are widely used in computer vision,natural language processing,and so on,which generally require low power and high efficiency in real applications.Thus,energy efficiency has become a critical indicator of CNN accelerators.Considering that asynchronous circuits have the advantages of low power consumption,high speed,and no clock distribution problems,we design and implement an energy-efficient asynchronous CNN accelerator with a 65 nm Complementary Metal Oxide Semiconductor(CMOS)process.Given the absence of a commercial design tool flow for asynchronous circuits,we develop a novel design flow to implement Click-based asynchronous bundled data circuits efficiently to mask layout with conventional Electronic Design Automation(EDA)tools.We also introduce an adaptive delay matching method and perform accurate static timing analysis for the circuits to ensure correct timing.The accelerator for handwriting recognition network(LeNet-5 model)is implemented.Silicon test results show that the asynchronous accelerator has 30%less power in computing array than the synchronous one and that the energy efficiency of the asynchronous accelerator achieves 1.538 TOPS/W,which is 12%higher than that of the synchronous chip. 展开更多
关键词 Convolutional Neural network(CNN)accelerator asynchronous circuit energy efficiency adaptive delay matching asynchronous design flow
原文传递
Tetris:A Heuristic Static Memory Management Framework for Uniform Memory Multicore Neural Network Accelerators
15
作者 Xiao-Bing Chen Hao Qi +4 位作者 Shao-Hui Peng Yi-Min Zhuang Tian Zhi Yun-Ji Chen Distinguished Member,CCF 《Journal of Computer Science & Technology》 SCIE EI CSCD 2022年第6期1255-1270,共16页
Uniform memory multicore neural network accelerators(UNNAs)furnish huge computing power to emerging neural network applications.Meanwhile,with neural network architectures going deeper and wider,the limited memory cap... Uniform memory multicore neural network accelerators(UNNAs)furnish huge computing power to emerging neural network applications.Meanwhile,with neural network architectures going deeper and wider,the limited memory capacity has become a constraint to deploy models on UNNA platforms.Therefore how to efficiently manage memory space and how to reduce workload footprints are urgently significant.In this paper,we propose Tetris:a heuristic static memory management framework for UNNA platforms.Tetris reconstructs execution flows and synchronization relationships among cores to analyze each tensor’s liveness interval.Then the memory management problem is converted to a sequence permutation problem.Tetris uses a genetic algorithm to explore the permutation space to optimize the memory management strategy and reduce memory footprints.We evaluate several typical neural networks and the experimental results demonstrate that Tetris outperforms the state-of-the-art memory allocation methods,and achieves an average memory reduction ratio of 91.9%and 87.9%for a quad-core and a 16-core Cambricon-X platform,respectively. 展开更多
关键词 multicore neural network accelerators liveness analysis static memory management memory reuse genetic algorithm
原文传递
Dissection of genetic network underlying important agronomic traits accelerates modern breeding in soybean
16
《Science Foundation in China》 CAS 2017年第4期33-,共1页
With the support by the National Natural Science Foundation of China and the'Strategic Priority Research Program'of the Chinese Academy of Sciences,a collaborative study by the research groups led by Professor... With the support by the National Natural Science Foundation of China and the'Strategic Priority Research Program'of the Chinese Academy of Sciences,a collaborative study by the research groups led by Professors Tian Zhixi(田志喜),Wang Guodong(王国栋),and Zhu Baoge(朱保葛)from the 展开更多
关键词 Dissection of genetic network underlying important agronomic traits accelerates modern breeding in soybean
原文传递
DRNet:Towards fast,accurate and practical dish recognition 被引量:1
17
作者 CHENG SiYuan CHU BinFei +4 位作者 ZHONG BiNeng ZHANG ZiKai LIU Xin TANG ZhenJun LI XianXian 《Science China(Technological Sciences)》 SCIE EI CAS CSCD 2021年第12期2651-2661,共11页
Existing algorithms of dish recognition mainly focus on accuracy with predefined classes,thus limiting their application scope.In this paper,we propose a practical two-stage dish recognition framework(DRNet)that yield... Existing algorithms of dish recognition mainly focus on accuracy with predefined classes,thus limiting their application scope.In this paper,we propose a practical two-stage dish recognition framework(DRNet)that yields a tradeoff between speed and accuracy while adapting to the variation in class numbers.In the first stage,we build an arbitrary-oriented dish detector(AODD)to localize dish position,which can effectively alleviate the impact of background noise and pose variations.In the second stage,we propose a dish reidentifier(DReID)to recognize the registered dishes to handle uncertain categories.To further improve the accuracy of DRNet,we design an attribute recognition(AR)module to predict the attributes of dishes.The attributes are used as auxiliary information to enhance the discriminative ability of DRNet.Moreover,pruning and quantization are processed on our model to be deployed in embedded environments.Finally,to facilitate the study of dish recognition,a well-annotated dataset is established.Our AODD,DReID,AR,and DRNet run at about 14,25,16,and 5 fps on the hardware RKNN 3399 pro,respectively. 展开更多
关键词 neural network acceleration neural network quantization object detection reidentification dish recognition
原文传递
Design of high parallel CNN accelerator based on FPGA for AIoT
18
作者 Lin Zhijian Gao Xuewei +3 位作者 Chen Xiaopei Zhu Zhipeng Du Xiaoyong Chen Pingping 《The Journal of China Universities of Posts and Telecommunications》 EI CSCD 2022年第5期1-9,61,共10页
To tackle the challenge of applying convolutional neural network(CNN)in field-programmable gate array(FPGA)due to its computational complexity,a high-performance CNN hardware accelerator based on Verilog hardware desc... To tackle the challenge of applying convolutional neural network(CNN)in field-programmable gate array(FPGA)due to its computational complexity,a high-performance CNN hardware accelerator based on Verilog hardware description language was designed,which utilizes a pipeline architecture with three parallel dimensions including input channels,output channels,and convolution kernels.Firstly,two multiply-and-accumulate(MAC)operations were packed into one digital signal processing(DSP)block of FPGA to double the computation rate of the CNN accelerator.Secondly,strategies of feature map block partitioning and special memory arrangement were proposed to optimize the total amount of off-chip access memory and reduce the pressure on FPGA bandwidth.Finally,an efficient computational array combining multiplicative-additive tree and Winograd fast convolution algorithm was designed to balance hardware resource consumption and computational performance.The high parallel CNN accelerator was deployed in ZU3 EG of Alinx,using the YOLOv3-tiny algorithm as the test object.The average computing performance of the CNN accelerator is 127.5 giga operations per second(GOPS).The experimental results show that the hardware architecture effectively improves the computational power of CNN and provides better performance compared with other existing schemes in terms of power consumption and the efficiency of DSPs and block random access memory(BRAMs). 展开更多
关键词 artificial intelligence of things(AIoT) convolutional neural network(CNN)accelerator Winograd convolution field-programmable gate array(FPGA)
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部