LSN:Long-Term Spatio-Temporal Network for Video Recognition

导出

摘要 Although recurrent neural networks(RNNs)are widely leveraged to process temporal or sequential data,they have attracted too little attention in current video action recognition applications.Therefore,this work attempts to model the long-term spatio-temporal information of the video based on a variant of RNN,i.e.,higher-order RNN.Moreover,we propose a novel long-term spatio-temporal network(LSN)for solving this video task,the core of which integrates the newly constructed high-order ConvLSTM(HO-ConvLSTM)modules with traditional 2D convolutional blocks.Specifically,each HO-ConvLSTM module consists of an accumulated temporary state(ATS)module as well as a standard ConvLSTM module,and several previous hidden states in the ATS module are accumulated to one temporary state that will enter the standard ConvLSTM to determine the output together with the current input.The HO-ConvLSTM module can be inserted into different stages of the 2D convolutional neural network(CNN)in a plug-andplay manner,thus well characterizing the long-term temporal evolution at various spatial resolutions.Experiment results on three commonly used video benchmarks demonstrate that the proposed LSN model can achieve competitive performance with the representative models.

作者 Zhenwei Wang Wei Dong Bingbing Zhang Jianxin Zhang

机构地区 School of Computer Science and Engineering SEAC Key Laboratory of Big Data Applied Technology Institute of Machine Intelligence and Bio-Computing School of Information and Communication Engineering

出处《国际计算机前沿大会会议论文集》 2022年第1期326-338,共13页 International Conference of Pioneering Computer Scientists, Engineers and Educators（ICPCSEE）

基金 supported by the National Natural Science Foundation of China (61972062,61902220) the Young and Middle-aged Talents Program of the National Civil Affairs Commission,and the University-Industry Collaborative Education Program (201902029013).

关键词 Video action recognition High-order RNN Long-term spatio-temporal ConvLSTM HO-ConvLSTM

分类号 G63 [文化科学—教育学]

引文网络
相关文献

1江锴威,王进,张琳钰,芦欣,刘国庆.利用局部监督的跨模态行人重识别研究[J].计算机应用研究,2023,40(4):1226-1232.
2Tian WANG,Jiakun LI,Huai-Ning WU,Ce LI,Hichem SNOUSSI,Yang WU.ResLNet:deep residual LSTM network with longer input for action recognition[J].Frontiers of Computer Science,2022,16(6):41-49.
3Motasem S.Alsawadi,El-Sayed M.El-kenawy,Miguel Rio.Using BlazePose on Spatial Temporal Graph Convolutional Networks for Action Recognition[J].Computers, Materials & Continua,2023(1):19-36.
4Yi REN,Ning XU,Miaogen LING,Xin GENG.Label distribution for multimodal machine learning[J].Frontiers of Computer Science,2022,16(1):33-43.
5Wei ZHAO,Li XU.Efficient decoding self-attention for end-to-end speech synthesis[J].Frontiers of Information Technology & Electronic Engineering,2022,23(7):1127-1138.
6Jiabao Shi,Zhao Qiu,Tao Chen,Jiale Lin,Hancheng Huang,Yunlong He,d Yu Yang.Skeleton Keypoints Extraction Method Combined with Object Detection[J].Journal of New Media,2022,4(2):97-106.
7姜雨,袁琪,胡志韬,吴薇薇,顾欣.基于气象因素的机场进离港延误预测[J].系统工程与电子技术,2023,45(6):1722-1731. 被引量：2
8Hongsheng XU,Zihan CHEN,Yu ZHANG,Xin GENG,Siya MI,Zhihong YANG.Weakly supervised temporal action localization with proxy metric modeling[J].Frontiers of Computer Science,2023,17(2):63-72.
9Wenjia Kong,Haochen Li,Chen Yu,Jiangjiang Xia,Yanyan Kang,Pingwen Zhang.A Deep Spatio-Temporal Forecasting Model for Multi-Site Weather Prediction Post-Processing[J].Communications in Computational Physics,2022,31(1):131-153. 被引量：1
10王进,李博涵,吴佳骏,宋欣洋.支持日志乱序提交的分布式一致性协议[J].浙江大学学报（工学版）,2023,57(2):320-329.

国际计算机前沿大会会议论文集

2022年第1期

浏览历史

内容加载中请稍等...

LSN:Long-Term Spatio-Temporal Network for Video Recognition

相关作者

相关机构

相关主题

浏览历史