摘要
针对现有的视频人体行为识别模型识别能力有限,以及双流识别方法易受光照因素的影响导致时间成本较高的问题,提出一种基于注意力机制的ResNeXt模型用于识别视频中的人体行为。将经过预处理后的视频帧数据作为该模型的输入,该卷积网络模型使用ResNeXt101层网络作为核心残差块。在ResNeXt三维卷积神经网络模型的基础上,通过引入注意力机制来加强重要的特征信道,提高网络模型的特性表示及稳健性。使用Kinetics的预训练模型,对UCF-101和HMDB-51的数据进行了训练和学习,迭代200次后,在验证集上的识别率分别达到了96.0%和69.9%。实验结果显示,该模型能有效识别视频中的时空特征,与以往的识别模型相比准确率有所提高,且在人体行为识别任务中识别率较好。该模型能在保证深层网络的同时,使特征不丢失并且防止发生过拟合,同时识别的正确率也得到了改善,证明了该模型是有效可行的。
In view of the limited recognition ability of the existing video human behavior recognition models and the high time cost of the dual stream recognition method due to the influence of lighting factors,we propose a ResNeXt model based on attention mechanism to recognize human behavior in video.The preprocessed video frame data is used as the input of the model,and the ResNeXt101 layer network is used as the core residual block by the convolution network model.On the basis of ResNeXt three-dimensional convolutional neural network model,attention mechanism is introduced to strengthen important characteristic channels and improve the characteristic representation and robustness of the network model.We use the pre-training model of Kinetics dynamics to train and learn the data of UCF-101 and HMDB-51.After 200 iterations,the recognition rates on the verification set reach 96.0%and 69.9%respectively.The experimental results show that such model can effectively recognize the spatiotemporal features in video,and the recognition accuracy is significantly improved compared with the previous recognition models.Such model can not only ensure that the features are not lost,but also prevent the occurrence of over fitting,and the accuracy of recognition has been significantly improved,which proves that the model is effective and feasible.
作者
李建平
赖永倩
LI Jian-ping;LAI Yong-qian(School of Computer and Information Technology,Northeast Petroleum University,Daqing 163318,China)
出处
《计算机技术与发展》
2023年第4期69-74,共6页
Computer Technology and Development
基金
国家自然科学基金重点项目(61933007)。
关键词
深度学习
残差网络
三维卷积网络
视频行为识别
注意力机制
deep learning
residual network
three dimensional convolution network
video behavior recognition
attention mechanism