摘要
双流卷积神经网络能够获取视频局部空间和时间特征的一阶统计信息,测试阶段将多个视频局部特征的分类器分数平均作为最终的预测.但是,一阶统计信息不能充分建模空间和时间特征分布,测试阶段也未考虑使用多个视频局部特征之间的更高阶统计信息.针对这两个问题,本文提出一种基于二阶聚合的视频多阶信息融合方法.首先,通过建立二阶双流模型得到视频局部特征的二阶统计信息,与一阶统计信息形成多阶信息.其次,将基于多阶信息的视频局部特征分别进行二阶聚合,形成高阶视频全局表达.最后,采用两种策略融合该表达.实验表明,本文方法能够有效提高行为识别精度,在HMDB51和UCF101数据集上的识别准确率比双流卷积神经网络分别提升了8%和2:1%,融合改进的密集点轨迹(Improved dense trajectory,IDT)特征之后,其性能进一步提升.
The classical two-stream convolutional neural network(CNN)can capture the first-order statistics of the local spatial and temporal features from an input video,while making final predictions by averaging the softmax scores of the local video features.However,the first-order statistics can not fully characterize the distribution of the spatial and temporal features,while higher-order information inherent in local features is discarded at the test stage.To solve the two problems above,this paper proposes a multi-order information fusion method for human action recognition.To this end,we first introduce a novel two-stream CNN model for capturing second-order statistics of the local spatial and temporal features,which,together with the original first-order statistics,forms the so-called multi-order information.We perform individually second-order aggregation of these extracted local multi-order information to compute global video representations.Finally,two strategies are proposed to fuse video representations for prediction.The experimental results demonstrate that our proposed method significantly improves recognition accuracy over the original two-stream CNN model,i.e.,8%and 2.1%gains on the HMDB51 and UCF101,respectively.The performance of our method is further improved by combining traditional IDT(improved dense trajectory)features.
作者
张冰冰
葛疏雨
王旗龙
李培华
ZHANG Bing-Bing;GE Shu-Yu;WANG Qi-Long;LI Pei-Hua(School of Information and Communication Engineering,Dalian University of Technology,Dalian 116033;College of Intelligence and Computing,Tianjin University,Tianjin 300350)
出处
《自动化学报》
EI
CAS
CSCD
北大核心
2021年第3期609-619,共11页
Acta Automatica Sinica
基金
国家自然科学基金(61971086,61806140,61471082)资助。
关键词
行为识别
双流卷积神经网络
多阶信息融合
二阶聚合
Human action recognition
two-stream convolutional neural network
multi-order information fusion
second-order aggregation