提取全局语义信息的场景图生成算法被引量：1

Global semantic information extraction based scene graph generation algorithm

导出

摘要目的场景图能够简洁且结构化地描述图像。现有场景图生成方法重点关注图像的视觉特征,忽视了数据集中丰富的语义信息。同时,受到数据集长尾分布的影响,大多数方法不能很好地对出现概率较小的三元组进行推理,而是趋于得到高频三元组。另外,现有大多数方法都采用相同的网络结构来推理目标和关系类别,不具有针对性。为了解决上述问题,本文提出一种提取全局语义信息的场景图生成算法。方法网络由语义编码、特征编码、目标推断以及关系推理等4个模块组成。语义编码模块从图像区域描述中提取语义信息并计算全局统计知识,融合得到鲁棒的全局语义信息来辅助不常见三元组的推理。目标编码模块提取图像的视觉特征。目标推断和关系推理模块采用不同的特征融合方法,分别利用门控图神经网络和门控循环单元进行特征学习。在此基础上,在全局统计知识的辅助下进行目标类别和关系类别推理。最后利用解析器构造场景图,进而结构化地描述图像。结果在公开的视觉基因组数据集上与其他10种方法进行比较,分别实现关系分类、场景图元素分类和场景图生成这3个任务,在限制和不限制每对目标只有一种关系的条件下,平均召回率分别达到了44.2%和55.3%。在可视化实验中,相比性能第2的方法,本文方法增强了不常见关系类别的推理能力,同时改善了目标类别与常见关系的推理能力。结论本文算法能够提高不常见三元组的推理能力,同时对于常见的三元组也具有较好的推理能力,能够有效地生成场景图。 ObjectiveThe scene graph can construct a graph structure for image interpretation.The image objects and interrelations are represented via nodes and edges.However,the existing methods have focused on the visual features and lack of semantic information.While the semantic information can provide robust feature and improve the capability of inference.In addition,it is challenged of long-tailed distribution issue in the dataset.The 30 regular relationships account for 69%of the sample size,while the triplet of 20 irregular relationships just has 31%of the sample size.Most of methods cannot maintain qualified results on the rare triplets and tend to infer the regular one.To improve the reasoning ability of irregular triples,we demonstrated a scene graph generation algorithm to generate robust features.MethodThe components of this network are semantic encoding,feature encoding,target inference,and relationship reasoning.The semantic coding module first represents the word in region description into low dimension via word embedding.Thanks to the Word2Vec model is trained on a large corpus database,it can better represent the semantics of words based on complete word embedding.We use the Word2Vec network to traverse the region description of the dataset and extract the intermediate word embedding vectors of 150 types of targets and 50 types of relationships as the semantic information.Additionally,in this module,we explicitly calculate global statistical knowledge,which can represent the global characters of the dataset.We use graph convolution networks to integrate them with semantic information.This method can get global semantic information,which strengthens the reasoning capability of rare triplets.The feature encoding module extracts the visual image features based on faster region convolutional neural network(Faster R-CNN).We remove its classification network and use its feature extraction network,region proposal network,and region of interest pooling layer to get visual features of image processing.In the target reasoning and the relationship reasoning modules,visual features and global semantic information are fused to obtain global semantic features via different feature fusion methods.These features applications can enhance the performance of rare triplets through clarifying the differences of target and relationship.In respect of the target reasoning module,we use graphs to represent the images and use gated graph neural networks to aggregate the context information.After three times step iteration,the target feature has been completely improved,we train a classifier to determine the target classes using these final global semantic features.Objects’classes can benefit to the reasoning capability of relationships.In respect of the relationship in reasoning module,we use both object class and the global semantic feature of relationship to conduct reasoning work.We use gated recurrent units to refine features and reasoning the relationship.Each relationship feature will aggregate information derived from the corresponding object pair.Meanwhile,a parser is used to construct the scene graph to describe structured images.ResultWe carried out experiments on the public visual genome dataset and compared it with 10 methods proposed.We actually conduct predicate classification,scene graph classification,and scene graph generation tasks,respectively.Ablation experiments were also performed.The average recall reached 44.2%and55.3%under each setting,respectively.Compared with the neural motifs method,the R@50 of the scene graph classification task has a 1.3%improvement.With respect of the visualization part,we visualize the results of the scene graph generation task.The target location and their class in the original image are marked.The target and relationship classes are represented based on node and edge.Compared with the second score obtained in the quantitative analysis part,our network enhances the reasoning capability of rare relationships significantly in terms of the reasoning capability of target and common relationships improvement.ConclusionOur demonstrated algorithms facilitate the reasoning capability of rare triplets.It has good performance on regular-based triplets reasoning as well as scene graph generation.

作者段静雯闵卫东杨子元张煜陈鑫浩杨升宝 Duan Jingwen;Min Weidong;Yang Ziyuan;Zhang Yu;Chen Xinhao;Yang Shengbao(School of Information Engineering,Nanchang University,Nanchang 330031,China;School of Software,Nanchang University,Nanchang 330047,China;Jiangxi Key Laboratory of Smart City,Nanchang 330047,China)

机构地区南昌大学信息工程学院南昌大学软件学院江西省智慧城市重点实验室

出处《中国图象图形学报》 CSCD 北大核心 2022年第7期2214-2225,共12页 Journal of Image and Graphics

基金国家自然科学基金项目(62076117,61762061) 江西省自然科学基金项目(20161ACB20004) 江西省智慧城市重点实验室项目(20192BCD40002)。

关键词场景图全局语义信息目标推断关系推理图像理解 scene graph global semantic information target inference relationship reasoning image interpretation

分类号 TP391 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献1

1赵永强,饶元,董世鹏,张君毅.深度学习目标检测方法综述[J].中国图象图形学报,2020,25(4):629-654. 被引量：221

共引文献220

1程林,柏杨,都昌平,薛翔天,章品正,於文雪,王世杰,陈阳.基于深度学习的X光地铁危险物品检测算法[J].中国体视学与图像分析,2021,26(3):301-309. 被引量：2
2徐哲壮,黄平,陈丹,吴开田,李建坤.融合机器视觉与邻近度估计的相似工业设备识别策略研究[J].仪器仪表学报,2023,44(1):283-290. 被引量：3
3赵朗月,吴一全.基于机器视觉的表面缺陷检测方法研究进展[J].仪器仪表学报,2022,43(1):198-219. 被引量：85
4黎国溥,陈升东,王亮,邹凯,袁峰.基于改进YOLOv5的车辆端目标检测[J].计算机系统应用,2022,31(12):127-134. 被引量：7
5孔刘玲,刘秀文.基于改进YOLOv4算法的船舶目标检测方法[J].船舶工程,2022,44(1):96-103. 被引量：10
6陈涛.目标检测在数字人文图像中的应用尝试[J].数字人文研究,2021,1(3):39-50. 被引量：2
7胡伏原,李林燕,尚欣茹,沈军宇,戴永良.基于卷积神经网络的目标检测算法综述[J].苏州科技大学学报（自然科学版）,2020,37(2):1-10. 被引量：20
8唐悦,吴戈,朴燕.改进的GDT-YOLOV3目标检测算法[J].液晶与显示,2020,35(8):852-860. 被引量：10
9杨朝红,王伟男.基于优化SSD300的小尺度典型军事目标识别方法研究[J].电脑与信息技术,2020,28(4):19-22. 被引量：5
10赵伟,王正平,张晓辉,向乾,贺云涛.面向疫情防控的无人机关键技术综述[J].无人系统技术,2020,3(3):8-18. 被引量：9

同被引文献14

1张腾,任俊生,张秀凤.基于三维时域Green函数法的船舶在规则波浪中的运动数学模型[J].交通运输工程学报,2019,19(2):110-121. 被引量：9
2王卫红,梁朝凯,闵勇.基于可视块的多记录型复杂网页信息提取算法[J].计算机科学,2019,46(10):63-70. 被引量：13
3辛浪,刘钧,袁渊.基于图像分割和局部亮度调整的微光图像颜色传递算法[J].应用光学,2020,41(2):309-317. 被引量：5
4金昱潼,吕健,潘伟杰,赵子健,尤乾.基于视觉注意机制虚拟交互界面布局优化[J].计算机工程与设计,2020,41(3):763-769. 被引量：12
5王雅坤,任家骏,李爱峰,孟浩南.矿用挖掘机驾驶室操作界面的布局优化设计[J].工程设计学报,2020,27(4):469-477. 被引量：8
6赵亚伟,焦杰,刘小雪.基于用户触点的界面可行性优化策略[J].包装工程,2020,41(18):229-235. 被引量：2
7邹静,陈永刚,龚金琪,董万虎,孙燕飞,王志林.一种利用多维目标分割比的矢量图形匹配算法[J].武汉大学学报（信息科学版）,2020,45(10):1626-1632. 被引量：7
8瞿珏,朱帅,王崴,李芳正,胡波.自适应界面视觉搜索认知特性研究[J].电子学报,2021,49(2):338-345. 被引量：4
9汪大伟.色彩构成对人机交互界面设计的影响研究[J].机械设计,2021,38(5). 被引量：6
10蒋雨肖,丁晟春,吴鹏.基于BiLSTM-VGG16的多模态信息特征分类研究[J].情报理论与实践,2021,44(11):180-186. 被引量：14

引证文献1

1李馥颖,蒋强.基于信息图形化的网页界面视觉均衡优化仿真[J].计算机仿真,2023,40(7):195-199.

1王旖旎,高永彬,万卫兵,杨淑群,郭茹燕.结合外部知识库与适应性推理的场景图生成模型[J].计算机工程,2022,48(9):230-238.
2马峰.国际视野下统计知识的典型试题分析——兼谈对新课程实施和命题的启示[J].数学通报,2022,61(7):21-25.
3魏秀参,许玉燕,杨健.网络监督数据下的细粒度图像识别综述[J].中国图象图形学报,2022,27(7):2057-2077. 被引量：7
4专题征稿启事——开放环境下的视觉感知与理解[J].无线电工程,2022,52(9).
5张欣,杜尚荣.基于童话教学的儿童想象力培养逻辑、机制与路径[J].铜仁学院学报,2022,24(4):65-72.
6朱皓华.从数据分析到判断预测[J].小学科学,2022(8):60-62.
7何其安.数据工程师:数据赋能智解难题[J].高校招生（高考指导）,2022(3):126-128.
8杨银琪.中职学校统计课程的教学策略探析[J].天津职业院校联合学报,2022,24(3):76-80.
9史卫东.利用F-MSAP分析菜心表观遗传多样性[J].广西植物,2022,42(8):1357-1366.
10支佩佩,邓健志,钟震霄.基于卷积和注意力机制的医学细胞核图像分割网络[J].生物医学工程学杂志,2022,39(4):730-739. 被引量：2

中国图象图形学报

2022年第7期

浏览历史

内容加载中请稍等...

提取全局语义信息的场景图生成算法被引量：1

参考文献1

共引文献220

同被引文献14

引证文献1

相关作者

相关机构

相关主题

浏览历史

提取全局语义信息的场景图生成算法 被引量：1

参考文献1

共引文献220

同被引文献14

引证文献1

相关作者

相关机构

相关主题

浏览历史

提取全局语义信息的场景图生成算法被引量：1