目前主流人体动作识别大部分都是基于卷积神经网络(Convolutional Neural Network,CNN)实现,而CNN容易忽略视频中的空间位置信息,从而降低了视频空间频域中动作识别能力。同时传统CNN不能快速定位到关键的特征位置,并且在训练过程中不...目前主流人体动作识别大部分都是基于卷积神经网络(Convolutional Neural Network,CNN)实现,而CNN容易忽略视频中的空间位置信息,从而降低了视频空间频域中动作识别能力。同时传统CNN不能快速定位到关键的特征位置,并且在训练过程中不能并行计算导致效率低。为了解决传统CNN在处理时间频域和多并行计算问题,提出了基于视觉Transformer(Vision Transformer,ViT)和3D卷积网络学习时空特征(Learning Spatiotemporal Features with 3D Convolutional Network,C3D)的人体动作识别算法。使用C3D提取视频的多维特征图、ViT的特征切片窗口对多维特征进行全局特征分割;使用Transformer的编码-解码模块对视频中人体动作进行预测。实验结果表明,所提的人体动作识别算法在UCF-101、HMDB51数据集上提高了动作识别的准确率。展开更多
An iterative separation approach, i.e. source signals are extracted and removed one by one, is proposed for multichannel blind deconvolution of colored signals. Each source signal is extracted in two stages: a filtere...An iterative separation approach, i.e. source signals are extracted and removed one by one, is proposed for multichannel blind deconvolution of colored signals. Each source signal is extracted in two stages: a filtered version of the source signal is first obtained by solving the generalized eigenvalue problem, which is then followed by a single channel blind deconvolution based on ensemble learning. Simulation demonstrates the capability of the approach to perform efficient mutichannel blind deconvolution.展开更多
文摘目前主流人体动作识别大部分都是基于卷积神经网络(Convolutional Neural Network,CNN)实现,而CNN容易忽略视频中的空间位置信息,从而降低了视频空间频域中动作识别能力。同时传统CNN不能快速定位到关键的特征位置,并且在训练过程中不能并行计算导致效率低。为了解决传统CNN在处理时间频域和多并行计算问题,提出了基于视觉Transformer(Vision Transformer,ViT)和3D卷积网络学习时空特征(Learning Spatiotemporal Features with 3D Convolutional Network,C3D)的人体动作识别算法。使用C3D提取视频的多维特征图、ViT的特征切片窗口对多维特征进行全局特征分割;使用Transformer的编码-解码模块对视频中人体动作进行预测。实验结果表明,所提的人体动作识别算法在UCF-101、HMDB51数据集上提高了动作识别的准确率。
基金Supported by the National Natural Science Foundation of China(No.60072048)the Doctoral Program Fund(No.20010561007)
文摘An iterative separation approach, i.e. source signals are extracted and removed one by one, is proposed for multichannel blind deconvolution of colored signals. Each source signal is extracted in two stages: a filtered version of the source signal is first obtained by solving the generalized eigenvalue problem, which is then followed by a single channel blind deconvolution based on ensemble learning. Simulation demonstrates the capability of the approach to perform efficient mutichannel blind deconvolution.