期刊文献+
共找到5篇文章
< 1 >
每页显示 20 50 100
Visual Relationship Detection with Contextual Information 被引量:1
1
作者 Yugang Li Yongbin Wang +1 位作者 Zhe Chen Yuting Zhu 《Computers, Materials & Continua》 SCIE EI 2020年第6期1575-1589,共15页
Understanding an image goes beyond recognizing and locating the objects in it,the relationships between objects also very important in image understanding.Most previous methods have focused on recognizing local predic... Understanding an image goes beyond recognizing and locating the objects in it,the relationships between objects also very important in image understanding.Most previous methods have focused on recognizing local predictions of the relationships.But real-world image relationships often determined by the surrounding objects and other contextual information.In this work,we employ this insight to propose a novel framework to deal with the problem of visual relationship detection.The core of the framework is a relationship inference network,which is a recurrent structure designed for combining the global contextual information of the object to infer the relationship of the image.Experimental results on Stanford VRD and Visual Genome demonstrate that the proposed method achieves a good performance both in efficiency and accuracy.Finally,we demonstrate the value of visual relationship on two computer vision tasks:image retrieval and scene graph generation. 展开更多
关键词 visual relationship deep learning gated recurrent units image retrieval contextual information
下载PDF
Graph-based method for human-object interactions detection 被引量:1
2
作者 XIA Li-min WU Wei 《Journal of Central South University》 SCIE EI CAS CSCD 2021年第1期205-218,共14页
Human-object interaction(HOIs)detection is a new branch of visual relationship detection,which plays an important role in the field of image understanding.Because of the complexity and diversity of image content,the d... Human-object interaction(HOIs)detection is a new branch of visual relationship detection,which plays an important role in the field of image understanding.Because of the complexity and diversity of image content,the detection of HOIs is still an onerous challenge.Unlike most of the current works for HOIs detection which only rely on the pairwise information of a human and an object,we propose a graph-based HOIs detection method that models context and global structure information.Firstly,to better utilize the relations between humans and objects,the detected humans and objects are regarded as nodes to construct a fully connected undirected graph,and the graph is pruned to obtain an HOI graph that only preserving the edges connecting human and object nodes.Then,in order to obtain more robust features of human and object nodes,two different attention-based feature extraction networks are proposed,which model global and local contexts respectively.Finally,the graph attention network is introduced to pass messages between different nodes in the HOI graph iteratively,and detect the potential HOIs.Experiments on V-COCO and HICO-DET datasets verify the effectiveness of the proposed method,and show that it is superior to many existing methods. 展开更多
关键词 human-object interactions visual relationship context information graph attention network
下载PDF
Stacked Attention Networks for Referring Expressions Comprehension
3
作者 Yugang Li Haibo Sun +2 位作者 Zhe Chen Yudan Ding Siqi Zhou 《Computers, Materials & Continua》 SCIE EI 2020年第12期2529-2541,共13页
Referring expressions comprehension is the task of locating the image region described by a natural language expression,which refer to the properties of the region or the relationships with other regions.Most previous... Referring expressions comprehension is the task of locating the image region described by a natural language expression,which refer to the properties of the region or the relationships with other regions.Most previous work handles this problem by selecting the most relevant regions from a set of candidate regions,when there are many candidate regions in the set these methods are inefficient.Inspired by recent success of image captioning by using deep learning methods,in this paper we proposed a framework to understand the referring expressions by multiple steps of reasoning.We present a model for referring expressions comprehension by selecting the most relevant region directly from the image.The core of our model is a recurrent attention network which can be seen as an extension of Memory Network.The proposed model capable of improving the results by multiple computational hops.We evaluate the proposed model on two referring expression datasets:Visual Genome and Flickr30k Entities.The experimental results demonstrate that the proposed model outperform previous state-of-the-art methods both in accuracy and efficiency.We also conduct an ablation experiment to show that the performance of the model is not getting better with the increase of the attention layers. 展开更多
关键词 Stacked attention networks referring expressions visual relationship deep learning
下载PDF
WordleNet: A Visualization Approach for Relationship Exploration in Document Collection
4
作者 Xu Wang Zuowei Cui +2 位作者 Lei Jiang Wenhuan Lu Jie Li 《Tsinghua Science and Technology》 SCIE EI CAS CSCD 2020年第3期384-400,共17页
Document collections do not only contain rich semantic content but also a diverse range of relationships.We propose WordleNet,an approach to supporting effective relationship exploration in document collections.Existi... Document collections do not only contain rich semantic content but also a diverse range of relationships.We propose WordleNet,an approach to supporting effective relationship exploration in document collections.Existing approaches mainly focus on semantic similarity or a single category of relationships.By constructing a general definition of document relationships,our approach enables the flexible and real-time generation of document relationships that may not otherwise occur to human researchers and may give rise to interesting patterns among documents.Multiple novel visual components are integrated in our approach,the effectiveness of which has been verified through a case study,a comparative study,and an eye-tracking experiment. 展开更多
关键词 document relationship interaction techniques text visualization relationship visualization visual analytics
原文传递
Comprehensive Relation Modelling for Image Paragraph Generation
5
作者 Xianglu Zhu Zhang Zhang +1 位作者 Wei Wang Zilei Wang 《Machine Intelligence Research》 EI CSCD 2024年第2期369-382,共14页
Image paragraph generation aims to generate a long description composed of multiple sentences,which is different from traditional image captioning containing only one sentence.Most of previous methods are dedicated to... Image paragraph generation aims to generate a long description composed of multiple sentences,which is different from traditional image captioning containing only one sentence.Most of previous methods are dedicated to extracting rich features from image regions,and ignore modelling the visual relationships.In this paper,we propose a novel method to generate a paragraph by modelling visual relationships comprehensively.First,we parse an image into a scene graph,where each node represents a specific object and each edge denotes the relationship between two objects.Second,we enrich the object features by implicitly encoding visual relationships through a graph convolutional network(GCN).We further explore high-order relations between different relation features using another graph convolutional network.In addition,we obtain the linguistic features by projecting the predicted object labels and their relationships into a semantic embedding space.With these features,we present an attention-based topic generation network to select relevant features and produce a set of topic vectors,which are then utilized to generate multiple sentences.We evaluate the proposed method on the Stanford image-paragraph dataset which is currently the only available dataset for image paragraph generation,and our method achieves competitive performance in comparison with other state-of-the-art(SOTA)methods. 展开更多
关键词 Image paragraph generation visual relationship scene graph graph convolutional network(GCN) long short-term memory
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部