期刊文献+

融合时序关系和上下文信息的时间动作检测方法

Temporal Relationship Integrated Context Information for Temporal Action Detection
下载PDF
导出
摘要 时间动作检测是视频理解领域中具有挑战性的任务。先前的时间动作检测模型主要关注视频帧的分类,而忽略视频帧之间的时序关系,导致时间动作检测模型的性能下降。为此,提出融合时序关系和上下文信息的时间动作检测方法(temporal action detection based on enhanced temporal relationship and context information,ETRD)。首先,设计了基于增强局部时序关系注意力机制的全局特征编码器,关注相邻帧的时序关系;其次,构建基于上下文信息的时序特征增强模块,融合上下文信息;最后,通过头部输出分类和回归结果。实验结果表明,所提出的ETRD模型在THUMOS14和ActivityNet1.3数据集上的平均mAP(mean average precision,平均精度均值)分别达到了67.5%和36.0%。相比于Actionformer模型的66.8%和35.6%,ETRD模型的平均mAP分别提升了0.7%和0.4%。利用视觉传感器,所提出的模型可检测出行为类别和持续时间。同时,结合心率等生理信号,可实现个体健康状态管理,为远程医疗、智能监控等提供了一种解决方案。 Temporal action detection is a challenging task in the field of video understanding.Previous temporal action detection models mainly focus on the classification of video frames,while ignoring the temporal relationship between video frames,which leads to the performance degradation of temporal action detection models.To this end,a temporal action detection method based on enhanced temporal relationship and context information(ETRD)is proposed.First,a global feature encoder based on enhanced local temporal relationship attention mechanism is designed to focus on the temporal relationship between adjacent frames.Second,a temporal feature enhancement module based on context information is constructed to fuse context information.Finally,the classification and regression results are output through the head.Experimental results show that the proposed ETRD model achieves an average mAP of 67.5% and 36.0% on the THUMOS14 and ActivityNet1.3 datasets,respectively.Compared with the 66.8% and 35.6% of the Actionformer model,the average mAP of the ETRD model is improved by 0.7% and 0.4%,respectively.Using visual sensors,the proposed model can detect the behavior category and duration.At the same time,combined with physiological signals such as heart rate,individual health status management can be achieved.Thus,a solution for telemedicine or intelligent monitoring,etc.will be provided.
作者 王猛 杨观赐 WANG Meng;YANG Guanci(Key Laboratory of Advanced Manufacturing Technology of the Ministry of Education,Guizhou University,Guiyang 550025,China)
出处 《贵州大学学报(自然科学版)》 2024年第6期78-84,90,共8页 Journal of Guizhou University:Natural Sciences
基金 国家自然科学基金资助项目(62163007,62373116) 贵州省科技计划项目(黔科合平台人才[2020]6007-2,黔科合支撑[2023]一般118)。
关键词 时间动作检测 时序关系 上下文信息 多头注意力机制 视频动作理解 temporal action detection temporal relationship context information multi head attention mechanism video action understanding

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部