摘要
人体行为识别中的关键问题是如何构建时空特征的提取和分类网络.针对目前提取的时空特征尺度单一、网络结构复杂等问题,本文提出一种结合注意力机制和多尺度时空信息的卷积网络(Multiscale Channels separation Spatiotemporal convolution Netw ork,M CST-Net).首先,以时空卷积为基础,通过类残差结构的M CST模块,对特征的通道维度进行分割和融合.不仅可以减少网络参数,而且可以获得多种尺度的时空感受野范围,使网络提取的时空特征更加丰富.其次,引入了一种改进的非局部注意力模块(INLA),以较低的计算量构建了特征信息的全局依赖关系,使模型更加高效地提取特征的关键信息.本文提出的网络,在经典的数据集UCF101和HMDB51上进行了大量的实验.实验结果表明,提出的MCST-Net识别准确率高于目前主流的行为识别算法,可以有效地提取多尺度的时空特征,具有结构简单、参数量少和泛化性强等优点.
How to construct an accurate spatiotemporal feature learning and classification network is an essential problem in human action recognition.For the problems of single scales of spatiotemporal features extracted and complex network structure,this paper proposes a multiscale channels separation spatiotemporal convolution network that combined with attention mechanism.Firstly,based on the spatiotemporal convolution,the MCST module with a residual-like structure is used to segment and fusion the feature channel sizes.Not only can the network parameters be reduced Jbut also the multiscale range of spatiotemporal receptive fields can be obtained.making the spatiotemporal features are abundantly extracted by the network.Secondly,an improved nonocal attention module(INLA)is introduced to construct a global dependence relationship of feature information with a low amount of calculation,so that the model can extract key information of features more efficiently.The proposed network has conducted a lot of experiments based on the classic action recognition datasets UCF101 and HMDB51 Experimental results show that the proposed MCST-Net recognition accuracy is higher than the current major algorithm of human action recognition.MCST-Net could effectively extract multiscale spatiotemporal feature,and has the advantages of simple structure fewer parameters and greater generalization ability.
作者
秦宇龙
王永雄
胡川飞
邵杭
QIN Yu-long;WANG Yong-xiong;HU Chuan-fei;SHAO Hang(School of Optical-Electrical and Computer Engineering,University of Shanghai for Science and Technology,Shanghai 200093,China)
出处
《小型微型计算机系统》
CSCD
北大核心
2021年第9期1802-1809,共8页
Journal of Chinese Computer Systems
基金
国家自然科学基金项目(61673276)资助。
关键词
深度学习
行为识别
多尺度时空特征
注意力机制
deep learning
action recognition
multiscale spatiotemporal features
attention module