基于全局时空感受野的高效视频分类方法

Efficient Video Classification Method Based on Global Spatiotemporal Receptive Field

下载PDF

导出

摘要在现有混合卷积神经网络架构(2D+3D)的视频分类方法中,卷积滤波器都是对局部区域进行操作,无法捕获大范围的时空依赖关系,特征通道之间缺乏相互依赖关系,传统的三维卷积核无法很好地建模时空特征.针对这些问题,提出了一种基于全局时空感受野的高效视频分类方法(CS-NL-SECO).首先将传统的三维卷积核分解成空域卷积核和时域卷积核,来更好地学习时空特征.然后在已有混合架构中的底层二维网络引入通道和空间注意力,通过学习自动获取每个特征通道的权重,依照权重关注重要的特征而抑制不相关的背景.最后在高层三维网络中引入全局时空感受野,学习全局时空特征表示自动捕获大范围的时空依赖关系.并在UCF101、HMDB51、Kinetics以及Something-something这四个视频分类常用的公有数据集上进行了实验,结果表明该方法无论在速度和精度上都远好于原方法,并且整体性能达到了最新方法的基准. In the video classification methods of mixed convolutional neural network architecture(2 D+3 D),Convolution filters all operate on local regions and cannot capture a wide range of spatiotemporal dependencies,there is a lack of interdependence between feature channels,and traditional 3 D convolution kernels cannot model spatiotemporal features.To address these issues,this paper proposes an efficient video classification method based on global spatiotemporal field(CS-NL-SECO).First,channels and spatial attention mechanism is introduced into the low-level 2 D networks of the mixed architecture to automatically select and focus on specific areas in important features to suppress irrelevant backgrounds.Second,global spatiotemporal receptive fields are introduced into highlevel 3 D networks,and global spatiotemporal features are learned to automatically capture a wide range of spatiotemporal dependencies.the method was tested on four public datasets commonly used for video classification:UCF101,HMDB51,Kinetics,and Something-something.This method is far better than baseline in terms of speed and accuracy,and the overall performance achieves the benchmark of state-of-the-art.

作者王辉涛胡燕 WANG Hui-tao;HU Yan(School of Computer,Wuhan University of Technology,Wuhan 430070,China)

机构地区武汉理工大学计算机学院

出处《小型微型计算机系统》 CSCD 北大核心 2020年第8期1768-1775,共8页 Journal of Chinese Computer Systems

基金湖北省自然科学基金重点类项目(2017CFA012)资助湖北省自然科学基金项目(2019CFC919)资助.

关键词视频分类卷积神经网络通道和空间注意力全局时空感受野三维卷积核分解 video classification convolutional neural network channel and spatial attention global spatiotemporal receptive field seperable 3D convolution kernels

分类号 TP391 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献1

1裴颂文,杨保国,顾春华.融合的三维卷积神经网络的视频流分类研究[J].小型微型计算机系统,2018,39(10):2266-2270. 被引量：5

二级参考文献2

1孟令恒,丁世飞.基于单静态图像的深度感知模型[J].山东大学学报（工学版）,2016,46(3):37-43. 被引量：3
2曾凯,丁世飞.图像超分辨率重建的研究进展[J].计算机工程与应用,2017,53(16):29-35. 被引量：39

共引文献4

1张正伟.基于人工智能技术的运动视频内容分类研究[J].现代电子技术,2020,43(9):58-61. 被引量：5
2裴颂文,樊静,沈天马,顾春华.面向低剂量CT图像的多生成器对抗网络降噪模型的研究[J].小型微型计算机系统,2020,41(12):2582-2587. 被引量：5
3毛琳,陈思宇,杨大伟.引导式的卷积神经网络视频行人动作分类改进方法[J].武汉大学学报（信息科学版）,2021,46(8):1241-1246.
4郭宇丰,董亚杰,李艳,李浩,王娜,王联旭.基于边缘计算的智能化建筑安全监控系统[J].昆明冶金高等专科学校学报,2024,40(4):89-94.

1谈咏东,王永雄,陈姝意,缪银龙.(2+1)D多时空信息融合模型及在行为识别的应用[J].信息与控制,2019,48(6):715-722. 被引量：3
2万承真,宋林波,杨伊玲.Revit与Twinmotion结合在室内装饰家装设计中的应用[J].门窗,2019,0(13):126-127.
3车维崧,彭书华,李俊杰.基于多特征时空信息融合的行为识别[J].北京信息科技大学学报（自然科学版）,2020,35(4):6-13.
4何鑫,许娟,金莹莹.行为关联网络:完整的变化行为建模[J].计算机科学,2020,47(9):123-128.
5庄一帆.Case study of the steps a high school student must take to become an entrepreneur[J].留学,2020(17):76-77.
6赵维,沈柏杉,张宇,孔俊.多角度视频的驾驶人员行为识别[J].吉林大学学报（信息科学版）,2020,38(3):353-359. 被引量：2
7祁志斌.基于人脸识别的食堂管理系统设计与应用研究[J].计算机产品与流通,2020,0(6):175-175. 被引量：2
8黄冰倩,夏婧,卢鹏,朱红,雷玲.贵州省森林保护“六个严禁”执法专项行动案件管理信息系统的设计与实现[J].林业勘查设计,2020,49(3):112-117. 被引量：3
9Bahman Zohuri.Transcranial and Repetitive Transcranial Magnetic Stimulation Driving a Noninvasive Depression Treatment[J].Journal of Health Science,2020,8(3):87-99.

小型微型计算机系统

2020年第8期

浏览历史

内容加载中请稍等...

基于全局时空感受野的高效视频分类方法

参考文献1

二级参考文献2

共引文献4

相关作者

相关机构

相关主题

浏览历史