摘要
针对视频动作定位算法中金字塔层数增加时间分辨率降低,导致细节特征不完整,进而影响预测结果不准确的问题,提出密集连接型特征金字塔主干网络。视频图像输入特征金字塔主干网络中,密集连接金字塔提取帧级特征和层级特征,实现特征提取阶段参考层、基础层特征与深层特征联系;帧级特征和层级特征通过预测阶段、动作起止时间及标签信息;预测阶段输出融合光流信息输出、动作起止时间及标签预测结果。在THUMOS14数据集的检测结果与AFSD相比,平均精度均值(mAP)提高0.4%,准确定位动作在视频中的起止时间和类别,可应用于智能监控等场景。
The number of pyramid layers increases and the time resolution decreases in the video action location algorithm,resulting in incomplete detail features,which affects the accuracy of prediction results.Aiming at the problem,this paper proposes a densely connected feature pyramid backbone network.The video image is input into the backbone network of the feature pyramid,and the pyramid is densely connected to extract the frame level features and hierarchical features,so as to realize the connection between the reference layer,foundation layer features and deep features in the feature extraction stage.Frame level and hierarchical features output start and end time of actions and label information in the prediction stage,and the fused optical flow information and label prediction results are also output.On the THUMOS14 dataset,compared with AFSD,the mean average precision(mAP)is improved by 0.4%.It can accurately locate the start and end time and category of actions in the video,and can be applied to intelligent monitoring and other scenes.
作者
佟明蔚
毛琳
杨大伟
TONG Ming-wei;MAO Lin;YANG Da-wei(School of Electromechanical Engineering,Dalian Minzu University,Dalian Liaoning 116605,China)
出处
《大连民族大学学报》
2022年第5期412-417,共6页
Journal of Dalian Minzu University
基金
国家自然科学基金项目(61673084)
辽宁省自然科学基金项目(20170540192,20180550866,2020-MZLH-24)。
关键词
时序动作定位
密集连接
特征金字塔
特征融合
temporal action localization
dense connection
feature pyramid
feature fusion