摘要
场景图为描述图像内容的结构图(Graph),其在生成过程中存在两个问题:1)二步式场景图生成方法造成有益信息流失,使得任务难度提高;2)视觉关系长尾分布使得模型发生过拟合、关系推理错误率上升。针对这两个问题,文中提出结合多尺度特征图和环型关系推理的场景图生成模型SGiF(Scene Graph in Features)。首先,计算多尺度特征图上的每一特征点存在视觉关系的可能性,并将存在可能性高的特征点特征提取出来;然后,从被提取出的特征中解码得到主宾组合,根据解码结果的类别差异,对结果进行去重,以此得到场景图结构;最后,根据场景图结构检测包含目标关系边在内的环路,将环路上的其他边作为计算调整因子的输入,以该因子调整原关系推理结果,并最终完成场景图的生成。实验设置SGGen和PredCls作为验证项,在大型场景图生成数据集VG(Visual Genome)子集上的实验结果表明,通过使用多尺度特征图,相比二步式基线,SGiF的视觉关系检测命中率提升了7.1%,且通过使用环型关系推理,相比非环型关系推理基线,SGiF的关系推理命中率提升了2.18%,从而证明了SGiF的有效性。
The scene graph is a graph describing image content.There are two problems in its generation:one is the loss of useful information caused by two-step scene graph generation method,which promotes the difficulty of this working,and the second is the model overfitting due to the long-tail distribution of visual relationship,which increases the error rate of relationship reasoning.To solve these two problems,a scene graph generation model SGiF(Scene Graph in Features)based on multi-scale feature map and ring-type relationship reasoning was proposed.Firstly,the possibility of visual relationship is calculated for each feature point on the multi-scale feature map and the features with high possibility are extracted.Then,the subject-object combination is decoded from extracted features.According to the difference of the decoding result category,the result will be deduplicated and the scene graph structure will be obtained.Finally,the ring including the targeted relationship edge is detected according to the graph structure,then the other edges of this ring are used as input of the calculation about factor to adjust the original relationship reasoning result,at last,the scene graph generation work is completed.In this paper,SGGen and PredCls were used as verification items.The experimental results on the subset of large dataset VG(Visual Genome)used for scene graph generation show that,by using multi-scale feature map,SGiF improves the hit rate of visual relationship detection by 7.1%compared with the two-step baseline,and by using the ring-type relationship reasoning,SGiF improves the accuracy of relational reasoning by 2.18%compared with the baseline with non-ring relational reasoning,thus proving the effectiveness of SGiF.
作者
庄志刚
许青林
ZHUANG Zhi-gang;XU Qing-lin(School of Computer,Guangdong University of Technology,Guangzhou 510000,China)
出处
《计算机科学》
CSCD
北大核心
2020年第4期136-141,共6页
Computer Science
基金
广东省科技计划项目(2016B030306003)。
关键词
场景图生成
多尺度特征图
环型关系推理
卷积神经网络
图像理解
Scene graph generation
Multi-scale feature map
Ring-type relationship reasoning
Convolution neural networks
Image understanding