期刊文献+

KCPNet:张量分解的轻量卷积模块设计、部署与应用 被引量:3

KCPNet:Design,Deployment,and Application of Tensor-Decomposed Lightweight Convolutional Module
下载PDF
导出
摘要 为解决现有卷积模块在实际应用中内存消耗高、计算效率低的问题,在Kronecker CANDECOMP/PARAFAC(KCP)张量分解的基础上,提出一种轻量、高效、瓶颈结构的卷积模块(KCPNet)。对普通卷积作2阶KCP分解,生成的因子张量分别映射为两层负责输入输出通道变化的1×1卷积和两层负责特征提取的变通道可分离卷积,再将这4层卷积组成含有瓶颈结构的KCPNet卷积模块。基于OpenCL并行编程框架将KCPNet部署于嵌入式GPU,并围绕pico-flexx深度相机开发了动态手势识别应用。实验结果表明:在ImageNet大规模标准数据集上,相比ResNet、ResNeXt等已有的张量分解卷积模块,KCPNet在准确率相近的情况下能够兼顾空间和计算复杂度的效率;在中等规模标准数据集CIFAR-10上,KCPNet能够在无明显精度损失的前提下将传统的VGG模型压缩至原先的16.1%并节约75.5%的计算量;在面向嵌入式GPU时,并行部署的KCPNet可使CIFAR-10的识别速度达到100帧/s。以KCPNet为核心开发的手势识别应用程序可达到99.5%的准确率和100帧/s以上的运行速度,内存开销为22 MB。 To deal with the issues of high memory consumption and low computation efficiency of the existing convolutional modules in practical application,following the Kronecker CANDECOMP/PARAFAC(KCP)tensor decomposition,a lightweight,efficient,bottleneck-structured convolutional module(KCPNet)is proposed.The normal convolution is firstly decomposed as the 2nd-order KCP,the generated factor tensors are mapped into two 1×1 convolutions to carry out the transformation of input and output channels,and into two depthwise convolutions with changeable channels to perform the feature extraction,respectively.Then,these 4 convolutions are assembled into the KCPNet module with a bottleneck structure.The KCPNet is deployed in the embedded GPU based on the OpenCL parallel programming framework,and the dynamic gesture recognition application is developed around a pico-flexx depth camera.The experimental results show that compared with the existing convolutional modules of tensor decomposition such as ResNet,ResNeXt,etc.,the KCPNet can consider the efficiency of both space and computation complexity with a similar accuracy on the large-scale benchmark dataset ImageNet;the KCPNet can also compress the traditional VGG model to its original 16.1%and save 75.5%of computation without obvious accuracy loss on the medium-scale benchmark dataset CIFAR-10;the parallelly deployed KCPNet can achieve the recognition rate of 100 frames/s for CIFAR-10 when orienting to the embedded GPU.The developed gesture recognition application with the KCPNet as its core can achieve 99.5%accuracy and the running rate beyond 100 frames/s,and the memory cost is 22 MB.
作者 王鼎衡 赵广社 姚满 李国齐 WANG Dingheng;ZHAO Guangshe;YAO Man;LI Guoqi(School of Automation Science and Engineering, Xi’an Jiaotong University, Xi’an 710049, China;Department of Precision Instrumentation, Tsinghua University, Beijing 100084, China)
出处 《西安交通大学学报》 EI CAS CSCD 北大核心 2022年第3期135-146,共12页 Journal of Xi'an Jiaotong University
基金 国家自然科学基金资助项目(61876215)。
关键词 张量分解 Kronecker CANDECOMP/PARAFAC张量分解 轻量卷积模块 并行部署 手势识别 tensor decomposition Kronecker CANDECOMP/PARAFAC tensor decomposition lightweight convolutional module parallel deployment gesture recognition
  • 相关文献

参考文献7

二级参考文献24

共引文献165

同被引文献18

引证文献3

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部