Training deep neural networks(DNNs)requires a significant amount of time and resources to obtain acceptable results,which severely limits its deployment in resource-limited platforms.This paper proposes DarkFPGA,a nov...Training deep neural networks(DNNs)requires a significant amount of time and resources to obtain acceptable results,which severely limits its deployment in resource-limited platforms.This paper proposes DarkFPGA,a novel customizable framework to efficiently accelerate the entire DNN training on a single FPGA platform.First,we explore batch-level parallelism to enable efficient FPGA-based DNN training.Second,we devise a novel hardware architecture optimised by a batch-oriented data pattern and tiling techniques to effectively exploit parallelism.Moreover,an analytical model is developed to determine the optimal design parameters for the DarkFPGA accelerator with respect to a specific network specification and FPGA resource constraints.Our results show that the accelerator is able to perform about 10 times faster than CPU training and about a third of the energy consumption than GPU training using 8-bit integers for training VGG-like networks on the CIFAR dataset for the Maxeler MAX5 platform.展开更多
In this study,we developed a system based on deep space–time neural networks for gesture recognition.When users change or the number of gesture categories increases,the accuracy of gesture recognition decreases consi...In this study,we developed a system based on deep space–time neural networks for gesture recognition.When users change or the number of gesture categories increases,the accuracy of gesture recognition decreases considerably because most gesture recognition systems cannot accommodate both user differentiation and gesture diversity.To overcome the limitations of existing methods,we designed a onedimensional parallel long short-term memory–fully convolutional network(LSTM–FCN)model to extract gesture features of different dimensions.LSTM can learn complex time dynamic information,whereas FCN can predict gestures efficiently by extracting the deep,abstract features of gestures in the spatial dimension.In the experiment,50 types of gestures of five users were collected and evaluated.The experimental results demonstrate the effectiveness of this system and robustness to various gestures and individual changes.Statistical analysis of the recognition results indicated that an average accuracy of approximately 98.9% was achieved.展开更多
In order to improve the detection accuracy of small objects,a neighborhood fusion-based hierarchical parallel feature pyramid network(NFPN)is proposed.Unlike the layer-by-layer structure adopted in the feature pyramid...In order to improve the detection accuracy of small objects,a neighborhood fusion-based hierarchical parallel feature pyramid network(NFPN)is proposed.Unlike the layer-by-layer structure adopted in the feature pyramid network(FPN)and deconvolutional single shot detector(DSSD),where the bottom layer of the feature pyramid network relies on the top layer,NFPN builds the feature pyramid network with no connections between the upper and lower layers.That is,it only fuses shallow features on similar scales.NFPN is highly portable and can be embedded in many models to further boost performance.Extensive experiments on PASCAL VOC 2007,2012,and COCO datasets demonstrate that the NFPN-based SSD without intricate tricks can exceed the DSSD model in terms of detection accuracy and inference speed,especially for small objects,e.g.,4%to 5%higher mAP(mean average precision)than SSD,and 2%to 3%higher mAP than DSSD.On VOC 2007 test set,the NFPN-based SSD with 300×300 input reaches 79.4%mAP at 34.6 frame/s,and the mAP can raise to 82.9%after using the multi-scale testing strategy.展开更多
Purpose-The trend of“Deep Learning for Internet of Things(IoT)”has gained fresh momentum with enormous upcoming applications employing these models as their processing engine and Cloud as their resource giant.But th...Purpose-The trend of“Deep Learning for Internet of Things(IoT)”has gained fresh momentum with enormous upcoming applications employing these models as their processing engine and Cloud as their resource giant.But this picture leads to underutilization of ever-increasing device pool of IoT that has already passed 15 billion mark in 2015.Thus,it is high time to explore a different approach to tackle this issue,keeping in view the characteristics and needs of the two fields.Processing at the Edge can boost applications with realtime deadlines while complementing security.Design/methodology/approach-This review paper contributes towards three cardinal directions of research in the field of DL for IoT.The first section covers the categories of IoT devices and how Fog can aid in overcoming the underutilization of millions of devices,forming the realm of the things for IoT.The second direction handles the issue of immense computational requirements of DL models by uncovering specific compression techniques.An appropriate combination of these techniques,including regularization,quantization,and pruning,can aid in building an effective compression pipeline for establishing DL models for IoT use-cases.The third direction incorporates both these views and introduces a novel approach of parallelization for setting up a distributed systems view of DL for IoT.Findings-DL models are growing deeper with every passing year.Well-coordinated distributed execution of such models using Fog displays a promising future for the IoT application realm.It is realized that a vertically partitioned compressed deep model can handle the trade-off between size,accuracy,communication overhead,bandwidth utilization,and latency but at the expense of an additionally considerable memory footprint.To reduce the memory budget,we propose to exploit Hashed Nets as potentially favorable candidates for distributed frameworks.However,the critical point between accuracy and size for such models needs further investigation.Originality/value-To the best of our knowledge,no study has explored the inherent parallelism in deep neural network architectures for their efficient distribution over the Edge-Fog continuum.Besides covering techniques and frameworks that have tried to bring inference to the Edge,the review uncovers significant issues and possible future directions for endorsing deep models as processing engines for real-time IoT.The study is directed to both researchers and industrialists to take on various applications to the Edge for better user experience.展开更多
It is significant to efficiently support artificial intelligence(AI)applications on heterogeneous mobile platforms,especially coordinately execute a deep neural network(DNN)model on multiple computing devices of one m...It is significant to efficiently support artificial intelligence(AI)applications on heterogeneous mobile platforms,especially coordinately execute a deep neural network(DNN)model on multiple computing devices of one mobile platform.This paper proposes HOPE,an end-to-end heterogeneous inference framework running on mobile platforms to distribute the operators in a DNN model to different computing devices.The problem is formalized into an integer linear programming(ILP)problem and a heuristic algorithm is proposed to determine the near-optimal heterogeneous execution plan.The experimental results demonstrate that HOPE can reduce up to 36.2%inference latency(with an average of 22.0%)than MOSAIC,22.0%(with an average of 10.2%)than StarPU and 41.8%(with an average of 18.4%)thanμLayer respectively.展开更多
准确的短期光伏功率预测对于保证电能质量及提高电力系统运行可靠性具有重要意义。为此,文章提出了一种基于小波变换和混合深度学习的短期光伏功率预测方法。首先,将天气类型分为理想天气(晴天)和非理想天气(多云、阴天等)。对于理想天...准确的短期光伏功率预测对于保证电能质量及提高电力系统运行可靠性具有重要意义。为此,文章提出了一种基于小波变换和混合深度学习的短期光伏功率预测方法。首先,将天气类型分为理想天气(晴天)和非理想天气(多云、阴天等)。对于理想天气,将历史光伏功率时间序列转化为二维图像作为混合深度学习模型(Hybrid Deep Learning Model,HDLM)的输入。对于非理想天气,使用小波变换对历史光伏功率时间序列进行分解,将得到的分量和气象参数转化成三维图像作为HDLM的输入。在HDLM中引入并行结构,由多个并列卷积神经网络和双向长短期记忆网络组成。实验结果表明,在理想天气和非理想天气条件下,所提短期光伏功率预测方法均具有较高的预测精度。展开更多
文摘Training deep neural networks(DNNs)requires a significant amount of time and resources to obtain acceptable results,which severely limits its deployment in resource-limited platforms.This paper proposes DarkFPGA,a novel customizable framework to efficiently accelerate the entire DNN training on a single FPGA platform.First,we explore batch-level parallelism to enable efficient FPGA-based DNN training.Second,we devise a novel hardware architecture optimised by a batch-oriented data pattern and tiling techniques to effectively exploit parallelism.Moreover,an analytical model is developed to determine the optimal design parameters for the DarkFPGA accelerator with respect to a specific network specification and FPGA resource constraints.Our results show that the accelerator is able to perform about 10 times faster than CPU training and about a third of the energy consumption than GPU training using 8-bit integers for training VGG-like networks on the CIFAR dataset for the Maxeler MAX5 platform.
基金supported in part by the National Natural Science Foundation of China under Grant 61461013in part of the Natural Science Foundation of Guangxi Province under Grant 2018GXNSFAA281179in part of the Dean Project of Guangxi Key Laboratory of Wireless Broadband Communication and Signal Processing under Grant GXKL06160103.
文摘In this study,we developed a system based on deep space–time neural networks for gesture recognition.When users change or the number of gesture categories increases,the accuracy of gesture recognition decreases considerably because most gesture recognition systems cannot accommodate both user differentiation and gesture diversity.To overcome the limitations of existing methods,we designed a onedimensional parallel long short-term memory–fully convolutional network(LSTM–FCN)model to extract gesture features of different dimensions.LSTM can learn complex time dynamic information,whereas FCN can predict gestures efficiently by extracting the deep,abstract features of gestures in the spatial dimension.In the experiment,50 types of gestures of five users were collected and evaluated.The experimental results demonstrate the effectiveness of this system and robustness to various gestures and individual changes.Statistical analysis of the recognition results indicated that an average accuracy of approximately 98.9% was achieved.
基金The National Natural Science Foundation of China(No.61603091)。
文摘In order to improve the detection accuracy of small objects,a neighborhood fusion-based hierarchical parallel feature pyramid network(NFPN)is proposed.Unlike the layer-by-layer structure adopted in the feature pyramid network(FPN)and deconvolutional single shot detector(DSSD),where the bottom layer of the feature pyramid network relies on the top layer,NFPN builds the feature pyramid network with no connections between the upper and lower layers.That is,it only fuses shallow features on similar scales.NFPN is highly portable and can be embedded in many models to further boost performance.Extensive experiments on PASCAL VOC 2007,2012,and COCO datasets demonstrate that the NFPN-based SSD without intricate tricks can exceed the DSSD model in terms of detection accuracy and inference speed,especially for small objects,e.g.,4%to 5%higher mAP(mean average precision)than SSD,and 2%to 3%higher mAP than DSSD.On VOC 2007 test set,the NFPN-based SSD with 300×300 input reaches 79.4%mAP at 34.6 frame/s,and the mAP can raise to 82.9%after using the multi-scale testing strategy.
基金supported by Technical Education Quality Improvement Programme(TEQIP-III)The project is implemented by NPIU which is a unit of MHRD,Govt of India for implementation of World Bank Assisted Projects in Technical Education。
文摘Purpose-The trend of“Deep Learning for Internet of Things(IoT)”has gained fresh momentum with enormous upcoming applications employing these models as their processing engine and Cloud as their resource giant.But this picture leads to underutilization of ever-increasing device pool of IoT that has already passed 15 billion mark in 2015.Thus,it is high time to explore a different approach to tackle this issue,keeping in view the characteristics and needs of the two fields.Processing at the Edge can boost applications with realtime deadlines while complementing security.Design/methodology/approach-This review paper contributes towards three cardinal directions of research in the field of DL for IoT.The first section covers the categories of IoT devices and how Fog can aid in overcoming the underutilization of millions of devices,forming the realm of the things for IoT.The second direction handles the issue of immense computational requirements of DL models by uncovering specific compression techniques.An appropriate combination of these techniques,including regularization,quantization,and pruning,can aid in building an effective compression pipeline for establishing DL models for IoT use-cases.The third direction incorporates both these views and introduces a novel approach of parallelization for setting up a distributed systems view of DL for IoT.Findings-DL models are growing deeper with every passing year.Well-coordinated distributed execution of such models using Fog displays a promising future for the IoT application realm.It is realized that a vertically partitioned compressed deep model can handle the trade-off between size,accuracy,communication overhead,bandwidth utilization,and latency but at the expense of an additionally considerable memory footprint.To reduce the memory budget,we propose to exploit Hashed Nets as potentially favorable candidates for distributed frameworks.However,the critical point between accuracy and size for such models needs further investigation.Originality/value-To the best of our knowledge,no study has explored the inherent parallelism in deep neural network architectures for their efficient distribution over the Edge-Fog continuum.Besides covering techniques and frameworks that have tried to bring inference to the Edge,the review uncovers significant issues and possible future directions for endorsing deep models as processing engines for real-time IoT.The study is directed to both researchers and industrialists to take on various applications to the Edge for better user experience.
基金Supported by the General Program of National Natural Science Foundation of China(No.61872043)。
文摘It is significant to efficiently support artificial intelligence(AI)applications on heterogeneous mobile platforms,especially coordinately execute a deep neural network(DNN)model on multiple computing devices of one mobile platform.This paper proposes HOPE,an end-to-end heterogeneous inference framework running on mobile platforms to distribute the operators in a DNN model to different computing devices.The problem is formalized into an integer linear programming(ILP)problem and a heuristic algorithm is proposed to determine the near-optimal heterogeneous execution plan.The experimental results demonstrate that HOPE can reduce up to 36.2%inference latency(with an average of 22.0%)than MOSAIC,22.0%(with an average of 10.2%)than StarPU and 41.8%(with an average of 18.4%)thanμLayer respectively.
文摘准确的短期光伏功率预测对于保证电能质量及提高电力系统运行可靠性具有重要意义。为此,文章提出了一种基于小波变换和混合深度学习的短期光伏功率预测方法。首先,将天气类型分为理想天气(晴天)和非理想天气(多云、阴天等)。对于理想天气,将历史光伏功率时间序列转化为二维图像作为混合深度学习模型(Hybrid Deep Learning Model,HDLM)的输入。对于非理想天气,使用小波变换对历史光伏功率时间序列进行分解,将得到的分量和气象参数转化成三维图像作为HDLM的输入。在HDLM中引入并行结构,由多个并列卷积神经网络和双向长短期记忆网络组成。实验结果表明,在理想天气和非理想天气条件下,所提短期光伏功率预测方法均具有较高的预测精度。