视频社会关系识别的多尺度图推理模型

Multi-Scale Graph Reasoning Model for Video Social Relation Recognition

下载PDF

导出

摘要人类社会关系识别作为视频分类中的一个重要问题,逐渐成为计算机视觉领域的一个研究热点。由于视频信息较多,冗余信息过量,关键帧较少,因此如何准确的识别视频中的关键信息进行社会关系推理至关重要。为此,本文提出一种多尺度图推理模型来进行视频社会关系识别。首先我们提取视频中的时空特征和语义对象信息,获得丰富、鲁棒的社会关系表示。接着通过多尺度图卷积利用不同的感受野来进行时间推理,捕捉人物和语义对象间的交互。特别地,我们利用注意力机制来评估每个语义对象在不同场景的效果。在SRIV数据集上的实验结果表明,本文提出的方法优于大多数先进的方法。 As an important issue in video classification, human social relationship recognition has gradually become a research hotspot in the field of computer vision. Due to the large amount of video information, excessive redundant information and less key frames, how to accurately identify the key information in the video and carry out social relation reasoning is of great importance. To this end, this paper proposes a multi-scale graph reasoning model to identify video social relationships. First, we extract the temporal and spatial features and semantic object information in the video to obtain a rich and Lupin representation of social relations. Then use different receptive fields to perform temporal reasoning through multi-scale graph convolution, and capture the interaction between characters and semantic objects. In particular, we use the attention mechanism to evaluate the effect of each semantic object in different scenarios. The experimental results on SRIV dataset show that the method proposed in this paper is superior to most advanced methods.

作者许飞张天雨史俊彪

机构地区合肥工业大学计算机科学与信息工程学院

出处《计算机科学与应用》 2021年第2期423-434,共12页 Computer Science and Application

关键词社会关系识别多尺度图卷积注意力机制

分类号 TP3 [自动化与计算机技术—计算机科学与技术]