期刊文献+
共找到3篇文章
< 1 >
每页显示 20 50 100
面向系统能力的数字逻辑与数字系统课程实践教学改革 被引量:3
1
作者 魏继增 王建荣 +3 位作者 李幼萌 于永新 王立 罗韬 《实验室研究与探索》 CAS 北大核心 2022年第10期179-183,225,共6页
针对国内高校计算机专业数字逻辑与数字系统课程无法满足信息领域人才培养需求的问题,以系统能力培养为目标,以虚实结合的实验平台为载体,围绕单周期32位MIPS(每秒处理百万级的机器语言指令数)处理器设计的主线任务,提出了“点—线—面... 针对国内高校计算机专业数字逻辑与数字系统课程无法满足信息领域人才培养需求的问题,以系统能力培养为目标,以虚实结合的实验平台为载体,围绕单周期32位MIPS(每秒处理百万级的机器语言指令数)处理器设计的主线任务,提出了“点—线—面”三层递进式实践教学体系,从课后练习、实验教学和综合实践3个维度形成对课程核心知识点的纵向贯穿。通过该实践教学改革,使学生在计算机系统能力上得到充分的训练,为后续课程开发更加复杂的流水线处理器和片上系统奠定了坚实的知识和能力基础。 展开更多
关键词 计算机系统能力 数字逻辑与数字系统 处理器设计 递进式实践教学
下载PDF
Fast Fourier transform convolutional neural network accelerator based on overlap addition
2
作者 You Chen Li Dejian +3 位作者 Feng Xi Shen Chongfei wei jizeng Liu Yu 《The Journal of China Universities of Posts and Telecommunications》 EI 2024年第5期71-84,共14页
In convolutional neural networks(CNNs), the floating-point computation in the traditional convolutional layer is enormous, and the execution speed of the network is limited by intensive computing, which makes it chall... In convolutional neural networks(CNNs), the floating-point computation in the traditional convolutional layer is enormous, and the execution speed of the network is limited by intensive computing, which makes it challenging to meet the real-time response requirements of complex applications. This work is based on the principle that the time domain convolution result equals the frequency domain point multiplication result to reduce the amount of floating-point calculations for convolution. The input feature map and the convolution kernel are converted to the frequency domain by the fast Fourier transform(FFT), and the corresponding point multiplication is performed. Then the frequency domain result is converted back to the time domain, and the output result of the convolution is obtained. In the shared CNN, the input feature map is much larger than the convolution kernel, resulting in many invalid operations. The overlap addition method is proposed to reduce invalid calculations and speed up network execution better. This work designs a hardware accelerator for frequency domain convolution and verifies its efficiency on the Xilinx Zynq UltraScale+MPSoC ZCU102 board. Comparing the calculation time of visual geometry group 16(VGG16) under the ImageNet dataset faster than the traditional time domain convolution, the hardware acceleration of frequency domain convolution is 8.5 times. 展开更多
关键词 convolutional neural network(CNN) fast Fourier transform(FFT) overlap addition
原文传递
Bypass-Enabled Thread Compaction for Divergent Control Flow in Graphics Processing Units
3
作者 LI Bingchao wei jizeng +1 位作者 GUO wei SUN Jizhou 《Journal of Shanghai Jiaotong university(Science)》 EI 2021年第2期245-256,共12页
Graphics processing units(GPUs)employ the single instruction multiple data(SIMD)hardware to run threads in parallel and allow each thread to maintain an arbitrary control flow.Threads running concurrently within a war... Graphics processing units(GPUs)employ the single instruction multiple data(SIMD)hardware to run threads in parallel and allow each thread to maintain an arbitrary control flow.Threads running concurrently within a warp may jump to different paths after conditional branches.Such divergent control flow makes some lanes idle and hence reduces the SIMD utilization of GPUs.To alleviate the waste of SIMD lanes,threads from multiple warps can be collected together to improve the SIMD lane utilization by compacting threads into idle lanes.However,this mechanism induces extra barrier synchronizations since warps have to be stalled to wait for other warps for compactions,resulting in that no warps are scheduled in some cases.In this paper,we propose an approach to reduce the overhead of barrier synchronizat ions induced by compactions,In our approach,a compaction is bypassed by warps whose threads all jump to the same path after branches.Moreover,warps waiting for a compaction can also bypass this compaction when no warps are ready for issuing.In addition,a compaction is canceled if idle lanes can not be reduced via this compaction.The experimental results demonstrate that our approach provides an average improvement of 21%over the baseline GPU for applications with massive divergent branches,while recovering the performance loss induced by compactions by 13%on average for applications with many non-divergent control flows. 展开更多
关键词 graphics processing unit(GPU) single instruction ultiple data(SIMD) THREAD warps BYPASS
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部