摘要
行人是城市交通场景下的弱势群体,为了避免碰撞,有必要准确地预测他们的动作行为.为此,本文首次提出城市交通场景中行人动作识别这一问题并提出了有针对性的解决方案.首先,我们创建了一个新的行人动作识别数据集(PARD)作为实验的数据基础,并给出了一个有效的基准模型MFVGG,该模型能够以较低的计算成本达到与之前先进人体动作识别方法相当的性能.为了更针对性地解决问题,本文在两个方面对基准模型进行了改进.首先,利用姿态先验来丰富特征表示,构造双流网络融合双分支编码特征.其次,本文引入双流神经架构搜索得到对于这项任务的最优层级网络架构.实验表明,提出的方法的性能超过了一般人体动作识别相关的先进算法.数据集以及代码公布在https://github.com/Yankeegsj/PARD.
Pedestrians are vulnerable on streets and their actions serve as important cues for motion prediction to avoid collisions.In this paper,we address the problem of pedestrian action recognition for the first time.We first introduce a new dataset,namely,the pedestrian action recognition dataset(PARD),which serves as a database for experiments.Then,we provide an efficient baseline method,MFVGG,reaching comparable performance to previous methods at lower costs.To better handle the canonical problem,we further improve the baseline from the following two aspects:first,we leverage the pose prior to enrich the feature representations;second,we propose a two-stream neural architecture search(NAS)method to obtain the optimal network architecture tailored to our task.From the experimental results on PARD,our method outperforms previous top-performing action recognition methods.The dataset and code are publicly available at https://github.com/Yankeegsj/PARD.
作者
龚申健
张姗姗
郭煜
杨健
陶冶
Shenjian GONG;Shanshan ZHANG;Yu GUO;Jian YANG;Ye TAO(Key Laboratory of Intelligent Perception and Systems for High-Dimensional Information of Ministry of Education and Jiangsu Key Laboratory of Image and Video Understanding for Social Safety,School of Computer Science and Engineering,Nanjing University of Science and Technology,Nanjing 210094,China;Horizon Robotics,Beijing 100085,China)
出处
《中国科学:信息科学》
CSCD
北大核心
2023年第3期485-499,共15页
Scientia Sinica(Informationis)
基金
国家自然科学基金国际(地区)合作与交流项目(批准号:61861136011)
国家自然科学基金(批准号:62172225)
中央高校基本科研基金(批准号:30920032201)资助。
关键词
深度学习
计算机视觉
动作识别
网络架构搜索
姿态估计
deep learning
computer vision
action recognition
neural architecture search
pose estimation