时域候选优化的时序动作检测被引量：2

Temporal proposal optimization for temporal action detection

导出

摘要目的时序动作检测(temporal action detection)作为计算机视觉领域的一个热点课题,其目的是检测视频中动作发生的具体区间,并确定动作的类别。这一课题在现实生活中具有深远的实际意义。如何在长视频中快速定位且实现时序动作检测仍然面临挑战。为此,本文致力于定位并优化动作发生时域的候选集,提出了时域候选区域优化的时序动作检测方法TPO(temporal proposal optimization)。方法采用卷积神经网络(convolutional neural network,CNN)和双向长短期记忆网络(bidirectional long short term memory,BLSTM)来捕捉视频的局部时序关联性和全局时序信息;并引入联级时序分类优化(connectionist temporal classification,CTC)方法,评估每个时序位置的边界概率和动作概率得分;最后,融合两者的概率得分曲线,优化时域候选区域候选并排序,最终实现时序上的动作检测。结果在Activity Net v1.3数据集上进行实验验证,TPO在各评价指标,如一定时域候选数量下的平均召回率AR@100(average recall@100),曲线下的面积AUC(area under a curve)和平均均值平均精度m AP(mean average precision)上分别达到74.66、66.32、30.5,而各阈值下的均值平均精度m AP@Io U(m AP@intersection over union)在阈值为0.75和0.95时也分别达到了30.73和8.22,与SSN(structured segment network)、TCN(temporal context network)、Prop-SSAD(single shot action detector for proposal)、CTAP(complementary temporal action proposal)和BSN(boundary sensitive network)等方法相比,TPO的所有性能指标均有提高。结论本文提出的模型兼顾了视频的全局时序信息和局部时序信息,使得预测的动作候选区域边界更为准确和灵活,同时也验证了候选区域的准确性能够有效提高时序动作检测的精确度。 Objective With the ubiquity of electronic equipment,such as cellphones and cameras,massive video data of people’s activities and behaviors in daily life are stored,recorded,and transmitted.Increasing video-based applications,such as video surveillance,have attracted the attention of researchers.However,real-world videos are consistently long and untrimmed.Long untrimmed videos in publicly available datasets for temporal action detection consistently contain several ambiguous frames and a large number of background frames.Accurately locating action proposals and recognizing action labels are difficult.Similar to object proposal generation in object detection task,the task of temporal action detection can be resolved into two phases,where the first phase is to determine the specific durations(starting and ending timestamps)of actions,and the second phase is to identify the category of each action instance.The development of single-action classification in trimmed videos has been extremely successful,whereas the performance of temporal action proposal generation remains unsatisfactory.The phase of candidate action proposal generation experiences time-consuming model training.High-quality proposals contribute to the performance of action detection.The study on temporal proposal generation can effectively and efficiently locate the video content and facilitate video understanding in untrimmed videos.In this work,we focus on the optimization of temporal action proposals for action detection.Method We aim to improve the performance of action detection by optimizing temporal action proposals,that is,accurately localizing the boundaries of actions in long untrimmed videos.We propose a temporal proposal optimization(TPO)model for the detection of candidate action proposals.TPO utilizes the advantages of convolutional neural networks(CNNs)and bidirectional long short-term memory(BLSTM)to simultaneously capture the local and global temporal cues.In the proposed TPO model,we introduce connectionist temporal classification(CTC)optimization,which excels at parsing global feature-level classification labels.The global actionness probability calculated by BLSTM and CTC modifies several inexact temporal cues in the local CNN actionness probability.Thus,a probability fusion strategy based on local and global actionness probabilities promotes the accuracy of temporal boundaries of actions in videos and results in the promising performance of temporal action detection.In particular,TPO is composed of three modules,namely,local actionness evaluation module(LAEM),global actionness evaluation module(GAEM),and post processing module(PPM).The extracted features are fed into LAEM and GAEM.Then,LAEM and GAEM generate the global and local actionness probabilities along the temporal dimension,respectively.LAEM is a temporal CNN-based module,and GAEM predicts the global actionness probabilities with the help of BLSTM and CTC losses.LAEM outputs three sequences.Starting and ending probabilities are found in addition to local actionness probabilities.The crossing of starting and ending probability curves builds the candidate temporal proposals.Thus,GAEM captures global actionness probabilities,which is auxiliary to LAEM.Then,the local and global actionness probabilities are fed into PPM to obtain a fused actionness probability curve.Subsequently,we sample the actionness probability curves through linear interpolation to extract proposal-level features.The proposal-level features are fed int a multilayer perceptron)to obtain the confidence score.We use the confidence score to rank the candidate proposals and adopt soft-NMS(non-maximum supression)to remove redundant proposals.Finally,we apply an existing classification model with our generated proposals to evaluate the detection performance of TPO.Result We validate the proposed model on two evaluations of action proposal generation and action detection.Experimental results indicate that TPO outperforms other state-of-the-art methods on Activity Net v1.3 dataset.For the proposal generation,we compare our model with the methods,including SSN(structured segment network),TCN(temporal context network),Prop-SSAD(single shot action detector for proposal),CTAP(complementary temporal action proposal),and BSN(boundary sensitive network).The proposed TPO model performs best and achieves average recall@average number of proposals of 74.66 and area under a curve of 66.32.For the temporal action detection task,we test the quantitative evaluation metric mean average precision@intersection over union(m AP@Io U).Compared with the existing methods,including SCC(semantic cascade context),CDC(convolutional-de-convolutional),SSN and BSN,TPO achieves the best m APs of 30.73 and 8.22 under the tIoUs of 0.75 and 0.95,respectively,and obtains the best average m AP of 30.5.Notably,the m AP value decreases with the increase in tIoU value.The tIoU metric reflects the overlap between the generated proposals and the ground truth,where a high tIoU value indicates strict constraints on candidate proposals.Thus,TPO achieves the best m AP performance under high tIoU values(0.75 and 0.95).This result validates the detection performance.TPO generates accurate proposals of action instances with high overlap on the ground truth and improves the detection performance.Conclusion In this paper,we propose a novel model called TPO for temporal proposal generation that achieves promising performance on Activity Net v1.3 to resolve the action detection problem.Experimental results demonstrate the effectiveness of TPO.TPO generates temporal proposals with precise boundaries and maintains flexible temporal durations,thereby covering sequential actions in videos with variable-length intervals.

作者熊成鑫郭丹刘学亮 Xiong Chengxin;Guo Dan;Liu Xueliang(School of Computer Science and Information Engineering,Hefei University of Technology,Hefei 230601,China)

机构地区合肥工业大学计算机与信息学院

出处《中国图象图形学报》 CSCD 北大核心 2020年第7期1447-1458,共12页 Journal of Image and Graphics

关键词时序动作检测时域候选区域动作概率得分级联时序分类卷积神经网络双向长短期记忆网络 temporal action detection temporal action proposals actionness probability connectionist temporal classification(CTC) convolutional neural network(CNN) bidirectional long short term memory(BLSTM)

分类号 TP301.6 [自动化与计算机技术—计算机系统结构]

引文网络
相关文献

参考文献3

1罗会兰,童康,孔繁胜.基于深度学习的视频中人体动作识别进展综述[J].电子学报,2019,47(5):1162-1173. 被引量：65
2罗会兰,赖泽云,孔繁胜.动作切分和流形度量学习的视频动作识别[J].中国图象图形学报,2017,22(8):1106-1119. 被引量：4
3曹诗雨,刘跃虎,李辛昭.基于Fast R-CNN的车辆目标检测[J].中国图象图形学报,2017,22(5):671-677. 被引量：66

二级参考文献11

1杜友田,陈峰,徐文立,李永彬.基于视觉的人的运动识别综述[J].电子学报,2007,35(1):84-90. 被引量：79
2徐光祐,曹媛媛.动作识别与行为理解综述[J].中国图象图形学报,2009,14(2):189-195. 被引量：50
3汪力,叶桦,夏良正.基于半马尔可夫和Large-margin的动作识别[J].中国图象图形学报,2009,14(11):2304-2310. 被引量：3
4宁忠磊,王宏琦,张正.一种基于协方差矩阵的自动目标检测方法[J].中国科学院研究生院学报,2010,27(3):370-375. 被引量：4
5胡琼,秦磊,黄庆明.基于视觉的人体动作识别综述[J].计算机学报,2013,36(12):2512-2524. 被引量：123
6王鑫,沃波海,管秋,陈胜勇.基于流形学习的人体动作识别[J].中国图象图形学报,2014,19(6):914-923. 被引量：30
7陈婷婷,阮秋琦,安高云.视频中人体行为的慢特征提取算法[J].智能系统学报,2015,10(3):381-386. 被引量：8
8吴冬梅,谢金壮,王静.基于多特征融合的人体行为识别[J].计算机应用与软件,2015,32(11):171-175. 被引量：6
9任晓芳,秦健勇,杨杰,任永军.基于能量模型的LS-TSVM在人体动作识别中的应用[J].计算机应用研究,2016,33(2):598-601. 被引量：10
10田国会,尹建芹,闫云章,李国栋.基于混合高斯模型和主成分分析的轨迹分析行为识别方法[J].电子学报,2016,44(1):143-149. 被引量：15

共引文献131

1姚晶晶.体育运动视频人体关节点运动轨迹自动识别方法[J].商丘师范学院学报,2022,38(12):16-20.
2林羽晨,张金艺,秦政,姜玉稀.融合双重注意力机制的复合头部动作识别[J].电子测量技术,2020(11):85-90. 被引量：1
3吴松平,王天一.基于神经网络和迁移学习的视频人体行为识别[J].智能计算机与应用,2021,11(12):153-157. 被引量：4
4李学春.农村人口的社会保障权利研究[J].兰州大学学报（社会科学版）,2000,28(1):10-16. 被引量：9
5回天,哈力旦.阿布都热依木,杜晗.结合Faster R-CNN的多类型火焰检测[J].中国图象图形学报,2019,24(1):73-83. 被引量：31
6腾云,贾勇勇,杨景刚,谢天喜.基于多算法融合的移动相机视频识别技术研究[J].自动化与仪器仪表,2019(1):60-63. 被引量：3
7周立学,马成前.关于公路隧道内目标图像实时检测仿真[J].计算机仿真,2019,36(1):192-196. 被引量：3
8史凯静,鲍泓.基于改进的FAST R-CNN的前方车辆检测研究[J].计算机科学,2018,45(B06):179-182. 被引量：7
9史凯静,鲍泓,徐冰心,潘卫国,郑颖.基于Faster RCNN的智能车道路前方车辆检测方法[J].计算机工程,2018,44(7):36-41. 被引量：24
10欧家祥,史文彬,张俊玮,丁超.基于深度学习的高效电力部件识别[J].电力大数据,2018,21(9):1-8. 被引量：16

同被引文献3

1冯林,刘胜蓝,王静,肖尧.人体运动分割算法:序列局部弯曲的流形学习[J].计算机辅助设计与图形学学报,2013,25(4):460-467. 被引量：7
2沈晴,班晓娟,常征,郭靖.基于视频的人机交互中动作在线发现与时域分割[J].计算机学报,2015,38(12):2477-2487. 被引量：5
3杨静.体育视频中羽毛球运动员的动作识别[J].自动化技术与应用,2018,37(10):120-124. 被引量：11

引证文献2

1王东祺,赵旭.类别敏感的全局时序关联视频动作检测[J].中国图象图形学报,2022,27(12):3566-3580. 被引量：3
2陶树,王美丽.结合姿态估计和时序分段网络分析的羽毛球视频动作识别[J].中国图象图形学报,2022,27(11):3280-3291. 被引量：4

二级引证文献7

1金涌涛,张恬波,季宇豪,林浩凡,赵璐旻.变电站机器人巡检中设备目标动态捕捉识别技术研究[J].机械设计,2024,41(S01):159-164.
2贺楚景,刘钦颖,王子磊.建模交互关系和类别依赖的视频动作检测[J].中国图象图形学报,2023,28(5):1499-1512.
3杨耿,梁俊威,蔡铁,李钦,郑家帆.新时代学校体育评价智慧大脑设计与构建研究[J].当代体育科技,2023,13(18):103-110. 被引量：2
4王彩霞,陶健.数据库中多源异构异常数据清洗方法[J].通化师范学院学报,2023,44(12):54-60. 被引量：2
5顾庆传,张靖,周丽,李鑫,朱豪,张鹏坤.基于图像识别的角度传感器设计[J].传感器与微系统,2024,43(2):113-115.
6张学琪,胡海洋,潘开来,李忠金.基于多视图自适应3D骨架网络的工业装箱动作识别[J].中国图象图形学报,2024,29(5):1392-1407.
7刘容娟.基于动作捕捉技术的羽毛球训练辅助教学系统设计[J].无线互联科技,2024,21(18):76-78.

1徐洪赞,李志成,张天乐.PHEV驱动模式分析与能耗优化[J].汽车世界,2020(8):43-44.
2陈爱玲.教育生态学视角下高中英语课堂教学评价的探索[J].科技资讯,2020,18(17):97-97. 被引量：4
3潘粤成,刘卓,潘文豪,蔡典仑,韦政松.一种基于CNN/CTC的端到端普通话语音识别方法[J].现代信息科技,2020,4(5):65-68. 被引量：3
4周天航.生活教育理念在高中政治教学中的回归[J].教学管理与教育研究,2020,5(13):67-68. 被引量：1
5李江,冯存前,王义哲,许旭光.一种用于锥体目标微动分类的深度学习模型[J].西安电子科技大学学报,2020,47(3):105-112. 被引量：1
6谈明高,廉益超,吴贤芳,刘厚林.叶轮时序对多级泵振动特性影响的试验测试[J].振动与冲击,2020,39(3):1-7. 被引量：4
7王素琴,张峰,高宇豆,石敏.基于图像序列的学习表情识别[J].系统仿真学报,2020,32(7):1322-1330. 被引量：10
8沈美燕,胡哲晟,刘曌煜,戴攀,黄晶晶,杨莉.考虑效益耦合和时序关联特性的配电网规划项目多阶段双Q优选[J].电力自动化设备,2020,40(6):22-28. 被引量：11
9田野.国际石油巨头战略转型逻辑逐渐清晰[J].中国石油企业,2020(3):20-26.
10李顺,李君,吴鑫,郎一辉,梅碧舟.基于LSTM的硬盘剩余寿命预测[J].浙江万里学院学报,2020,33(4):69-77. 被引量：3

中国图象图形学报

2020年第7期

浏览历史

内容加载中请稍等...

时域候选优化的时序动作检测被引量：2

参考文献3

二级参考文献11

共引文献131

同被引文献3

引证文献2

二级引证文献7

相关作者

相关机构

相关主题

浏览历史

时域候选优化的时序动作检测 被引量：2

参考文献3

二级参考文献11

共引文献131

同被引文献3

引证文献2

二级引证文献7

相关作者

相关机构

相关主题

浏览历史

时域候选优化的时序动作检测被引量：2