期刊文献+
共找到5篇文章
< 1 >
每页显示 20 50 100
面向现代GPU的Winograd卷积加速研究
1
作者 童敢 黄立波 吕雅帅 《电子学报》 EI CAS CSCD 北大核心 2024年第1期244-257,共14页
卷积运算是现代卷积神经网络中必不可少的组成部分,同时也是最耗时的.为了解决卷积算子的性能问题,包括快速傅里叶变换(Fast Fourier Transform,FFT)和Winograd在内的快速卷积算法被提出. Winograd卷积可被用于提高小卷积核的推理性能,... 卷积运算是现代卷积神经网络中必不可少的组成部分,同时也是最耗时的.为了解决卷积算子的性能问题,包括快速傅里叶变换(Fast Fourier Transform,FFT)和Winograd在内的快速卷积算法被提出. Winograd卷积可被用于提高小卷积核的推理性能,是目前卷积神经网络中的主流实现方法 .然而,Winograd卷积在许多高度优化的深度神经网络库和深度学习编译器中的实现比较低效.由于Winograd卷积的四个阶段的复杂数据依赖关系,面向GPU对其进行优化非常具有挑战性.本文针对现代GPU体系结构优化了Winograd卷积算子的性能.本文提出了Winograd计算阶段的等价变化及其利用Tensor Core进行计算的无同步实现,并进一步提出了利用不同GPU内存层级的部分计算核融合方法 PKF(Partial Kernel Fusion).基于张量虚拟机(Tensor Virtual Machine,TVM)和代码重构器PKF-Reconstructor(Partial Kernel Fusion Reconstructor),实现了高性能的Winograd卷积.对真实应用中卷积神经网络的卷积算子的评估表明,与cuDNN相比,本文所提算法实现了7.58~13.69倍的性能提升. 展开更多
关键词 Winograd卷积 低精度 部分计算核融合 卷积加速 GPU内存层级 Tensor Core
下载PDF
Distributed Resource Allocation in Dispersed Computing Environment Based on UAV Track Inspection in Urban Rail Transit
2
作者 tong gan Shuo Dong +1 位作者 Shiyou Wang Jiaxin Li 《Computers, Materials & Continua》 SCIE EI 2024年第7期643-660,共18页
With the rapid development of urban rail transit,the existing track detection has some problems such as low efficiency and insufficient detection coverage,so an intelligent and automatic track detectionmethod based on... With the rapid development of urban rail transit,the existing track detection has some problems such as low efficiency and insufficient detection coverage,so an intelligent and automatic track detectionmethod based onUAV is urgently needed to avoid major safety accidents.At the same time,the geographical distribution of IoT devices results in the inefficient use of the significant computing potential held by a large number of devices.As a result,the Dispersed Computing(DCOMP)architecture enables collaborative computing between devices in the Internet of Everything(IoE),promotes low-latency and efficient cross-wide applications,and meets users’growing needs for computing performance and service quality.This paper focuses on examining the resource allocation challenge within a dispersed computing environment that utilizes UAV inspection tracks.Furthermore,the system takes into account both resource constraints and computational constraints and transforms the optimization problem into an energy minimization problem with computational constraints.The Markov Decision Process(MDP)model is employed to capture the connection between the dispersed computing resource allocation strategy and the system environment.Subsequently,a method based on Double Deep Q-Network(DDQN)is introduced to derive the optimal policy.Simultaneously,an experience replay mechanism is implemented to tackle the issue of increasing dimensionality.The experimental simulations validate the efficacy of the method across various scenarios. 展开更多
关键词 UAV track inspection dispersed computing resource allocation deep reinforcement learning Markov decision process
下载PDF
Winograd快速卷积相关研究综述 被引量:2
3
作者 童敢 黄立波 《计算机科学与探索》 CSCD 北大核心 2022年第5期959-971,共13页
卷积神经网络(CNN)已经被广泛应用到各个领域并发挥了重要作用。卷积算子是卷积神经网络的基础组件,同时也是最耗时的部分。近年来,研究者提出了包括基于FFT和Winograd的若干种快速卷积算法。其中Winograd卷积因大幅减少了卷积中乘法操... 卷积神经网络(CNN)已经被广泛应用到各个领域并发挥了重要作用。卷积算子是卷积神经网络的基础组件,同时也是最耗时的部分。近年来,研究者提出了包括基于FFT和Winograd的若干种快速卷积算法。其中Winograd卷积因大幅减少了卷积中乘法操作且占用内存更小而迅速成为小卷积核的卷积算子上快速卷积实现的首选。但目前相关工作聚焦于算法的一般化、拓展和各类体系结构上的实现,还没有研究者对Winograd卷积算法作系统性的总结。为了给后续研究者提供详细的参考依据,对Winograd卷积引入以来的相关工作进行了总结。首先阐述了Winograd最小滤波算法及Winograd卷积的引入,介绍了Winograd卷积的一般化与拓展,并对比了现有实现之间的差异;从稀疏剪枝、低精度与量化、数值稳定性这三方面介绍了Winograd卷积的优化工作,并详细介绍了相关具体方法的优缺点;对各类体系结构上的实现和优化进行了分类总结,比较了各平台上实现可用的通用优化方法,并介绍了Winograd卷积的实际应用;最后对内容进行了简要总结,分析了现有研究的局限性,并对未来可能的方向进行了初步展望。 展开更多
关键词 Winograd卷积 快速卷积算法 卷积神经网络(CNN) 卷积优化
下载PDF
A Survey on Technologies and Challenges of LTE-U
4
作者 tong gan Shi-You Wang +2 位作者 Qiang Ma Yi-Dong Jia Yun-Yun Ma 《Computer Systems Science & Engineering》 SCIE EI 2022年第4期321-337,共17页
The rapid growth of mobile data traffic has caused great pressure on the limited spectrum resources,and there must be some better methods to deal with this problem.The innovative technology of Long-Term Evolution(LTE)u... The rapid growth of mobile data traffic has caused great pressure on the limited spectrum resources,and there must be some better methods to deal with this problem.The innovative technology of Long-Term Evolution(LTE)using the unlicensed spectrum,known as LTE-Unlicensed(LTE-U),has been proposed to effectively alleviate the shortage of authorized band resources.LTE-U has explored a lot of potential capacity in mobile communication systems with limited authorized spectrum resources,and improved the spectrum utilization of unauthorized frequency bands.However,LTE-U is still facing challenges in its application.In this paper,we summarize the key features of LTE-U and the coex-istence of LTE-U with Wi-Fi in the unlicensed Spectrum.We analyze the key technologies(including carrier aggregation,HARQ,interference cancelation,and centralized scheduling),the operating modes and deployment scenarios(including carrier aggregation LTE-U,duty cycle LTE-U,and standalone LTE-U),and the advantages(including anchored LTE-U and Standalone LTE-U scenarios),as well as main technical challenges.We then address the different management mechanisms of LTE-U and Wi-Fi(including the differences between the MAC layer and physical layer),the types of coexistence technology classification(including channel separation and channel sharing technologies),and directions for future work.We hope that this comprehensive survey spurs further research in this promising area. 展开更多
关键词 LTE-U unlicensed spectrum coexistence technology
下载PDF
Communication-based positioning systems:past,present and prospects
5
作者 Guan-Yi Ma Qing-Tao Wan tong gan 《Research in Astronomy and Astrophysics》 SCIE CAS CSCD 2012年第6期601-624,共24页
This paper reviews positioning systems in the context of communication systems. First, the basic positioning technique is described for location based ser- vice (LBS) in mobile communication systems. Then the high i... This paper reviews positioning systems in the context of communication systems. First, the basic positioning technique is described for location based ser- vice (LBS) in mobile communication systems. Then the high integrity global posi- tioning system (iGPS) is introduced in terms of aspects of what it is and how the low Earth orbit (LEO) Iridium telecommunication satellites enhance the global posi- tioning system (GPS). Emphasis is on the Chinese Area Positioning System (CAPS) which is mainly based on commercial geostationary (GEO) communication satellites, including decommissioned GEO and inclined geosynchronous communication satel- lites. Characterized by its low cost, high flexibility, wide-area coverage and ample frequency resources, a distinctive feature of CAPS is that its navigation messages are generated on the ground, then uploaded to and forwarded by the communication satellites. Fundamental principles and key technologies applied in the construction of CAPS are presented in detail from the CAPS validation phase to its experimental system setup. A prospective view of CAPS has concluded it to be a seamless, high ac- curacy, large capacity navigation and communication system which can be achieved by expanding it world wide and enhancing it with LEO satellites and mobile base stations. Hence, this system is a potential candidate for the next generation of radio navigation after GPS. 展开更多
关键词 satellite navigation -- communication -- mobile positioning -- CAPS-- iGPS -- LBS
下载PDF
上一页 1 下一页 到第
使用帮助 返回顶部