用于行人轨迹预测的场景限制时空图卷积网络

Scene-constrained spatial-temporal graph convolutional network for pedestrian trajectory prediction

导出

摘要目的针对行人轨迹预测问题,已有的几种结合场景信息的方法基于合并操作通过神经网络隐式学习场景与行人运动的关联,无法直观地解释场景对单个行人运动的调节作用。除此之外,基于图注意力机制的时空图神经网络旨在学习全局模式下行人之间的社会交互,在人群拥挤场景下精度不佳。鉴于此,本文提出一种场景限制时空图卷积神经网络(scene-constrained spatial-temporal graph convolutional neural network,Scene-STGCNN)。方法Scene-STGCNN由运动模块、基于场景的微调模块、时空卷积和时空外推卷积组成。运动模块以时空图卷积提取局部行人时空特征,避免了时空图神经网络在全局模式下学习交互的局限性。基于场景的微调模块将场景信息嵌入为掩模矩阵,用来调节运动模块生成的中间运动特征,具备实际场景下的物理解释性。通过最小化核密度估计下真实轨迹的负对数似然,增强Scene-STGCNN输出的多模态性,减少预测误差。结果实验在公开数据集ETH(包含ETH和HOTEL)和UCY(包含UNIV、ZARA1和ZARA2)上与其他7种主流方法进行比较,就平均值而言,相对于性能第2的模型,平均位移误差(average displacement error,ADE)值减少了12%,最终位移误差(final displacement error,FDE)值减少了9%。在同样的数据集上进行了消融实验以验证基于场景的微调模块的有效性,结果表明基于场景的微调模块能有效建模场景对行人轨迹的调节作用,从而减小算法的预测误差。结论本文提出的场景限制时空图卷积网络能有效融合场景和行人运动,在学习局部模式下行人交互的同时基于场景特征对轨迹特征做实时性调节,相比于其他主流方法,具有更优的性能。 Objective Pedestrian trajectory prediction is essential for such domains like unmanned vehicles,security sur⁃veillance,and social robotics nowadays.Trajectory prediction is beneficial for computer systems to perform better decision making and planning to some extent.Current methods are focused on pedestrian trajectory information,and scene elements-related spatial constraints on pedestrian motion in the same space are challenged to explain human-to-human social interactions further,in which future location of pedestrians cannot be located in building walls,and pedestrians at building corners undergo large velocity direction deflections due to cornering behavior.The pathways can be focused on the integrated scene information,for which the scene image is melted into a one-dimensional vector and merged with the trajec⁃tory information.Two-dimensional spatial signal of the scene will be distorted and it cannot be intuitively explained accord⁃ing to the modulating effect of the scene on pedestrian motion.To build a spatiotemporal graph representation of pedestri⁃ans,recent graph neural network(GNN)is used to develop a method based on graph attention network(GAT),in which pedestrians are as the graph nodes,trajectory features as the node attributes,and pedestrians-between spatial interactions are as the edges in the graph.These sorts of methods can be used to focus on pedestrians-between social interactions in the global scale.However,for crowded scenes,graph attention mechanism may not be able to assign appropriate weights to each pedestrian accurately,resulting in poor algorithm accuracy.To resolve the two problems mentioned above,we develop a scene constraints-based spatiotemporal graph convolutional network,called Scene-STGCNN,which aggregates pedestrian motion status with a graph convolutional neural network for local interactions,and it achieves accurate aggrega⁃tion of pedestrian motion status with a small number of parameters.At the same time,we design a scene-based fine-tuning module to explicitly model the modulating effect of scenes on pedestrian motion with the information of neighboring scene changes as input.Method Scene-STGCNN consists of a motion module,a scene-based fine-tuning module,spatiotemporal convolution,and spatiotemporal extrapolation convolution.For the motion module,the graph convolution is a 1×1 coresized convolutional neural network(CNN)layer for embedding pedestrian velocity information.The residual convolution is composed of CNN layer of 1×1 kernel size and BatchNorm(BN)layer.Temporal convolution is organized of BN layer,PReLU layer,3×1 core-sized CNN layer,BN layer and Dropout layer as well.The motion module takes the pedestrian velocity spatiotemporal graph and the scene mask matrix as input,in which CNN-based pedestrian velocity spatiotemporal graph is encoded and the pedestrian spatiotemporal features of existing multiple frames are fused.For the scene-based finetuning module,temporal neighboring scene change information is first introduced to generate the scene-based pedestrian spatiotemporal map,and the embedding of the pedestrian spatiotemporal map by scene convolution is then performed to obtain the scene mask matrix,which is used to make Hadamard products with the intermediate motion features in the motion module.The real-time regulation role of the scene on pedestrians can be explicitly modeling further.Spatiotemporal convolution as a transition coding network consists of two temporal gating units and a spatial convolution,which is used to enhance the temporal correlation and contextual spatial dependence of pedestrian motion.A two-dimensional Gaussian distribution-related trajectory distribution is generated in terms of temporal extrapolation convolution.The kernel density estimation-based negative log-likelihood as the loss function will enhance the multimodality of the Scene-STGCNN predic⁃tion distribution while the prediction loss is optimized.Result Experiments are carried out to compare with the other related seven popular methods on the publicly available datasets ETH(including ETH and HOTEL)and UCY(including UNIV,ZARA1,and ZARA2).The average displacement error(ADE)values are optimized by 12%,and the final displacement error(FDE)values are optimized by 9%in terms of average values.Ablation experiments are used to verify the effective⁃ness of the scene-based fine-tuning module,and the results demonstrate that the scene-based fine-tuning module can effec⁃tively model the modulation effect of the scene on pedestrian trajectory,and the prediction error of the algorithm is opti⁃mized as well.In addition,qualitative analysis is focused on the issues of Scene-STGCNN-captured inherent patterns of pedestrian motion and the involved prediction distribution.The visualization results show that Scene-STGCNN can be used to learn the pedestrian motion patterns effectively while maintaining accurate predictions.Conclusion we facilitate a pedes⁃trian trajectory prediction model,called Scene-STGCNN,which can fuse scene information with trajectory features effec⁃tively through a scene-based fine-tuning module.Furthermore,Scene-STGCNN potentials can be focused on scene information-related pedestrian trajectory prediction method to a certain extent via modeling the modulation effect of scene on pedestrian motion.

作者陈浩东纪庆革 Chen Haodong;Ji Qingge(School of Computer Science and Engineering,Sun Yat-sen University,Guangzhou 510006,China;Guangdong Key Laboratory of Big Data Analysis and Processing,Guangzhou 510006,China)

机构地区中山大学计算机学院广东省大数据分析与处理重点实验室

出处《中国图象图形学报》 CSCD 北大核心 2023年第10期3163-3175,共13页 Journal of Image and Graphics

基金广东省自然科学基金项目(2016A030313288)。

关键词行人轨迹预测场景空间限制时空特征提取时空卷积核密度估计 pedestrian trajectory prediction spatial constraints of scene spatio-temporal feature extraction spatiotemporal convolution kernel density estimation

分类号 TP391.4 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献2

1兰红,刘秦邑.图注意力网络的场景图到图像生成模型[J].中国图象图形学报,2020,25(8):1591-1603. 被引量：5
2刘彦北,李赫南,张长青,肖志涛,张芳,隗英,高耀宗,石峰,单飞,沈定刚.结构图注意力网络的新冠肺炎轻重症诊断[J].中国图象图形学报,2022,27(3):750-761. 被引量：2

二级参考文献4

1兰红,方治屿.3维灰度矩阵的钢板缺陷图像识别[J].中国图象图形学报,2019,24(6):859-869. 被引量：12
2左艳,黄钢,聂生东.深度学习在医学影像智能处理中的应用与挑战[J].中国图象图形学报,2021,26(2):305-315. 被引量：19
3强振平,何丽波,陈旭,徐丹.深度学习图像修复方法综述[J].中国图象图形学报,2019,0(3):447-463. 被引量：45
4刘哲良,朱玮,袁梓洋.结合全卷积网络与CycleGAN的图像实例风格迁移[J].中国图象图形学报,2019(8):1283-1291. 被引量：18

共引文献5

1张玮琪,汤轶丰,李林燕,胡伏原.基于场景图的段落生成序列图像方法[J].计算机科学,2022,49(1):233-240.
2高小天,张乾,吕凡,胡伏原.基于布局图的多物体场景新视角图像生成网络[J].计算机应用研究,2022,39(8):2526-2531.
3陶琪,靳华中,李文萱,黎林,袁福祥.一种空间关系增强的场景图生成方法[J].湖北工业大学学报,2022,37(4):36-42.
4程文娟,于国庆.基于XDense-RC-net的CXR图像分类算法[J].计算机应用研究,2022,39(12):3803-3807. 被引量：2
5王若莹,吕凡,赵柳清,胡伏原.融合用户需求和边界约束的平面图生成算法[J].计算机应用,2023,43(2):575-582.

1荣亮.基于深度学习的智慧课堂交互活动设计研究[J].职业技术,2023,22(10):75-81.
2马兰,于雅菲.浅析以游戏化交互设计优势促进网课中的自主学习效能[J].设计,2023,36(15):26-29. 被引量：2
3杨东红,杨媛媛,巩艳芬,孙俊伟,田枫.弹幕视频网站中用户社会交互行为特征分析——以B站美食制作类视频为例[J].情报科学,2023,41(8):104-112. 被引量：2
4李燕芳.“一带一路”新闻漫画中多模态隐喻的表征[J].厦门理工学院学报,2023,31(2):90-96.
5于淮.元宇宙视域下大学英语在线学习交互的四维创新[J].梧州学院学报,2023,33(4):63-70.
6阳耀芳,李春杰,雷津皓,何新益.在线教学大学生满意度调查研究[J].天津农学院学报,2023,30(3):84-89.
7于淮,黄嫣.4I理论视域下大学生英语在线学习满意度提升策略研究[J].吉林工程技术师范学院学报,2023,39(9):61-66.
8王海峰,刘小青,高金岭.基于视觉语法的多模态建筑术语库构建理据与路径[J].中国科技术语,2023,25(4):21-31. 被引量：2
9杜少晖.城市地铁施工中浅埋暗挖技术研究[J].中文科技期刊数据库（全文版）工程技术,2023(11):112-115.
10王焕,魏娜,张勇杰.信息公开与志愿服务参与意愿:基于社会组织评估的实验研究[J].公共行政评论,2023,16(5):86-105. 被引量：5

中国图象图形学报

2023年第10期

浏览历史

内容加载中请稍等...

用于行人轨迹预测的场景限制时空图卷积网络

参考文献2

二级参考文献4

共引文献5

相关作者

相关机构

相关主题

浏览历史