期刊文献+
共找到16篇文章
< 1 >
每页显示 20 50 100
A Fast Panoptic Segmentation Network for Self-Driving Scene Understanding
1
作者 Abdul Majid Sumaira Kausar +1 位作者 Samabia Tehsin Amina Jameel 《Computer Systems Science & Engineering》 SCIE EI 2022年第10期27-43,共17页
In recent years,a gain in popularity and significance of science understanding has been observed due to the high paced progress in computer vision techniques and technologies.The primary focus of computer vision based... In recent years,a gain in popularity and significance of science understanding has been observed due to the high paced progress in computer vision techniques and technologies.The primary focus of computer vision based scene understanding is to label each and every pixel in an image as the category of the object it belongs to.So it is required to combine segmentation and detection in a single framework.Recently many successful computer vision methods has been developed to aid scene understanding for a variety of real world application.Scene understanding systems typically involves detection and segmentation of different natural and manmade things.A lot of research has been performed in recent years,mostly with a focus on things(a well-defined objects that has shape,orientations and size)with a less focus on stuff classes(amorphous regions that are unclear and lack a shape,size or other characteristics Stuff region describes many aspects of scene,like type,situation,environment of scene etc.and hence can be very helpful in scene understanding.Existing methods for scene understanding still have to cover a challenging path to cope up with the challenges of computational time,accuracy and robustness for varying level of scene complexity.A robust scene understanding method has to effectively deal with imbalanced distribution of classes,overlapping objects,fuzzy object boundaries and poorly localized objects.The proposed method presents Panoptic Segmentation on Cityscapes Dataset.Mobilenet-V2 is used as a backbone for feature extraction that is pre-trained on ImageNet.MobileNet-V2 with state-of-art encoder-decoder architecture of DeepLabV3+with some customization and optimization is employed Atrous convolution along with Spatial Pyramid Pooling are also utilized in the proposed method to make it more accurate and robust.Very promising and encouraging results have been achieved that indicates the potential of the proposed method for robust scene understanding in a fast and reliable way. 展开更多
关键词 Panoptic segmentation instance segmentation semantic segmentation deep learning computer vision scene understanding autonomous applications atrous convolution
下载PDF
A Survey of Scene Understanding by Event Reasoning in Autonomous Driving 被引量:5
2
作者 Jian-Ru Xue Jian-Wu Fang Pu Zhang 《International Journal of Automation and computing》 EI CSCD 2018年第3期249-266,共18页
Realizing autonomy is a hot research topic for automatic vehicles in recent years. For a long time, most of the efforts to this goal concentrate on understanding the scenes surrounding the ego-vehicle(autonomous vehi... Realizing autonomy is a hot research topic for automatic vehicles in recent years. For a long time, most of the efforts to this goal concentrate on understanding the scenes surrounding the ego-vehicle(autonomous vehicle itself). By completing lowlevel vision tasks, such as detection, tracking and segmentation of the surrounding traffic participants, e.g., pedestrian, cyclists and vehicles, the scenes can be interpreted. However, for an autonomous vehicle, low-level vision tasks are largely insufficient to give help to comprehensive scene understanding. What are and how about the past, the on-going and the future of the scene participants? This deep question actually steers the vehicles towards truly full automation, just like human beings. Based on this thoughtfulness, this paper attempts to investigate the interpretation of traffic scene in autonomous driving from an event reasoning view. To reach this goal, we study the most relevant literatures and the state-of-the-arts on scene representation, event detection and intention prediction in autonomous driving. In addition, we also discuss the open challenges and problems in this field and endeavor to provide possible solutions. 展开更多
关键词 Autonomous vehicle scene understanding event reasoning intention prediction scene representation.
原文传递
Structure-aware fusion network for 3D scene understanding
3
作者 Haibin YAN Yating LV Venice Erin LIONG 《Chinese Journal of Aeronautics》 SCIE EI CAS CSCD 2022年第5期194-203,共10页
In this paper,we propose a Structure-Aware Fusion Network(SAFNet)for 3D scene understanding.As 2D images present more detailed information while 3D point clouds convey more geometric information,fusing the two complem... In this paper,we propose a Structure-Aware Fusion Network(SAFNet)for 3D scene understanding.As 2D images present more detailed information while 3D point clouds convey more geometric information,fusing the two complementary data can improve the discriminative ability of the model.Fusion is a very challenging task since 2D and 3D data are essentially different and show different formats.The existing methods first extract 2D multi-view image features and then aggregate them into sparse 3D point clouds and achieve superior performance.However,the existing methods ignore the structural relations between pixels and point clouds and directly fuse the two modals of data without adaptation.To address this,we propose a structural deep metric learning method on pixels and points to explore the relations and further utilize them to adaptively map the images and point clouds into a common canonical space for prediction.Extensive experiments on the widely used ScanNetV2 and S3DIS datasets verify the performance of the proposed SAFNet. 展开更多
关键词 3D point clouds Data fusion Structure-aware 3D scene understanding Deep metric learning
原文传递
Robust Counting in Overcrowded Scenes Using Batch-Free Normalized Deep ConvNet
4
作者 Sana Zahir Rafi Ullah Khan +4 位作者 Mohib Ullah Muhammad Ishaq Naqqash Dilshad Amin Ullah Mi Young Lee 《Computer Systems Science & Engineering》 SCIE EI 2023年第9期2741-2754,共14页
The analysis of overcrowded areas is essential for flow monitoring,assembly control,and security.Crowd counting’s primary goal is to calculate the population in a given region,which requires real-time analysis of con... The analysis of overcrowded areas is essential for flow monitoring,assembly control,and security.Crowd counting’s primary goal is to calculate the population in a given region,which requires real-time analysis of congested scenes for prompt reactionary actions.The crowd is always unexpected,and the benchmarked available datasets have a lot of variation,which limits the trained models’performance on unseen test data.In this paper,we proposed an end-to-end deep neural network that takes an input image and generates a density map of a crowd scene.The proposed model consists of encoder and decoder networks comprising batch-free normalization layers known as evolving normalization(EvoNorm).This allows our network to be generalized for unseen data because EvoNorm is not using statistics from the training samples.The decoder network uses dilated 2D convolutional layers to provide large receptive fields and fewer parameters,which enables real-time processing and solves the density drift problem due to its large receptive field.Five benchmark datasets are used in this study to assess the proposed model,resulting in the conclusion that it outperforms conventional models. 展开更多
关键词 Artificial intelligence deep learning crowd counting scene understanding
下载PDF
Semantic segmentation via pixel-to-center similarity calculation
5
作者 Dongyue Wu Zilin Guo +3 位作者 Aoyan Li Changqian Yu Nong Sang Changxin Gao 《CAAI Transactions on Intelligence Technology》 SCIE EI 2024年第1期87-100,共14页
Since the fully convolutional network has achieved great success in semantic segmentation,lots of works have been proposed to extract discriminative pixel representations.However,the authors observe that existing meth... Since the fully convolutional network has achieved great success in semantic segmentation,lots of works have been proposed to extract discriminative pixel representations.However,the authors observe that existing methods still suffer from two typical challenges:(i)The intra-class feature variation between different scenes may be large,leading to the difficulty in maintaining the consistency between same-class pixels from different scenes;(ii)The inter-class feature distinction in the same scene could be small,resulting in the limited performance to distinguish different classes in each scene.The authors first rethink se-mantic segmentation from a perspective of similarity between pixels and class centers.Each weight vector of the segmentation head represents its corresponding semantic class in the whole dataset,which can be regarded as the embedding of the class center.Thus,the pixel-wise classification amounts to computing similarity in the final feature space between pixels and the class centers.Under this novel view,the authors propose a Class Center Similarity(CCS)layer to address the above-mentioned challenges by generating adaptive class centers conditioned on each scenes and supervising the similarities between class centers.The CCS layer utilises the Adaptive Class Center Module to generate class centers conditioned on each scene,which adapt the large intra-class variation between different scenes.Specially designed Class Distance Loss(CD Loss)is introduced to control both inter-class and intra-class distances based on the predicted center-to-center and pixel-to-center similarity.Finally,the CCS layer outputs the processed pixel-to-center similarity as the segmentation prediction.Extensive experiments demonstrate that our model performs favourably against the state-of-the-art methods. 展开更多
关键词 computer vision deep neural networks image segmentation scene understanding
下载PDF
3D scene graph prediction from point clouds
6
作者 Fanfan WU Feihu YAN +1 位作者 Weimin SHI Zhong ZHOU 《Virtual Reality & Intelligent Hardware》 EI 2022年第1期76-88,共13页
Background In this study,we propose a novel 3D scene graph prediction approach for scene understanding from point clouds.Methods It can automatically organize the entities of a scene in a graph,where objects are nodes... Background In this study,we propose a novel 3D scene graph prediction approach for scene understanding from point clouds.Methods It can automatically organize the entities of a scene in a graph,where objects are nodes and their relationships are modeled as edges.More specifically,we employ the DGCNN to capture the features of objects and their relationships in the scene.A Graph Attention Network(GAT)is introduced to exploit latent features obtained from the initial estimation to further refine the object arrangement in the graph structure.A one loss function modified from cross entropy with a variable weight is proposed to solve the multi-category problem in the prediction of object and predicate.Results Experiments reveal that the proposed approach performs favorably against the state-of-the-art methods in terms of predicate classification and relationship prediction and achieves comparable performance on object classification prediction.Conclusions The 3D scene graph prediction approach can form an abstract description of the scene space from point clouds. 展开更多
关键词 scene understanding 3D scene graph Point cloud DGCNN GAT
下载PDF
Multimodal feature fusion based on object relation for video captioning 被引量:1
7
作者 Zhiwen Yan Ying Chen +1 位作者 Jinlong Song Jia Zhu 《CAAI Transactions on Intelligence Technology》 SCIE EI 2023年第1期247-259,共13页
Video captioning aims at automatically generating a natural language caption to describe the content of a video.However,most of the existing methods in the video captioning task ignore the relationship between objects... Video captioning aims at automatically generating a natural language caption to describe the content of a video.However,most of the existing methods in the video captioning task ignore the relationship between objects in the video and the correlation between multimodal features,and they also ignore the effect of caption length on the task.This study proposes a novel video captioning framework(ORMF)based on the object relation graph and multimodal feature fusion.ORMF uses the similarity and Spatio-temporal relationship of objects in video to construct object relation features graph and introduce graph convolution network(GCN)to encode the object relation.At the same time,ORMF also constructs a multimodal features fusion network to learn the relationship between different modal features.The multimodal feature fusion network is used to fuse the features of different modals.Furthermore,the proposed model calculates the length loss of the caption,making the caption get richer information.The experimental results on two public datasets(Microsoft video captioning corpus[MSVD]and Microsoft research-video to text[MSR-VTT])demonstrate the effectiveness of our method. 展开更多
关键词 APPROACHES deep learning multimodel scene understanding video analysis
下载PDF
Proximity Based Automatic Data Annotation for Autonomous Driving 被引量:8
8
作者 Chen Sun Jean M.Uwabeza Vianney +5 位作者 Ying Li Long Chen Li Li Fei-Yue Wang Amir Khajepour Dongpu Cao 《IEEE/CAA Journal of Automatica Sinica》 EI CSCD 2020年第2期395-404,共10页
The recent development in autonomous driving involves high-level computer vision and detailed road scene understanding.Today,most autonomous vehicles employ expensive high quality sensor-set such as light detection an... The recent development in autonomous driving involves high-level computer vision and detailed road scene understanding.Today,most autonomous vehicles employ expensive high quality sensor-set such as light detection and ranging(LIDAR)and HD maps with high level annotations.In this paper,we propose a scalable and affordable data collection and annotation framework image-to-map annotation proximity(I2MAP),for affordance learning in autonomous driving applications.We provide a new driving dataset using our proposed framework for driving scene affordance learning by calibrating the data samples with available tags from online database such as open street map(OSM).Our benchmark consists of 40000 images with more than40 affordance labels under various day time and weather even with very challenging heavy snow.We implemented sample advanced driver-assistance systems(ADAS)functions by training our data with neural networks(NN)and cross-validate the results on benchmarks like KITTI and BDD100K,which indicate the effectiveness of our framework and training models. 展开更多
关键词 Affordance learning autonomous vehicles data synchronization scene understanding
下载PDF
A semantic-centered cloud control framework for autonomous unmanned system 被引量:2
9
作者 PANG Weijian LI Hui +1 位作者 MA Xinyi ZHANG Hailin 《Journal of Systems Engineering and Electronics》 SCIE EI CSCD 2022年第4期771-784,共14页
Rich semantic information in natural language increases team efficiency in human collaboration, reduces dependence on high precision data information, and improves adaptability to dynamic environment. We propose a sem... Rich semantic information in natural language increases team efficiency in human collaboration, reduces dependence on high precision data information, and improves adaptability to dynamic environment. We propose a semantic centered cloud control framework for cooperative multi-unmanned ground vehicle(UGV) system. Firstly, semantic modeling of task and environment is implemented by ontology to build a unified conceptual architecture, and secondly, a scene semantic information extraction method combining deep learning and semantic web rule language(SWRL) rules is used to realize the scene understanding and task-level cloud task cooperation. Finally, simulation results show that the framework is a feasible way to enable autonomous unmanned systems to conduct cooperative tasks. 展开更多
关键词 scene understanding cloud control ONTOLOGY autonomous cooperation
下载PDF
A New Model of Scenario Comprehension and Practical Progressive Teaching
10
作者 Yanhang Zhang Xiaohong Su +1 位作者 Hongwei Liu Wei Wang 《计算机教育》 2021年第12期89-97,共9页
Taking Digital Logic Design,a professional foundation course for undergraduates in the School of Computer Science of Harbin Institute of Technology,as an example,we propose a new teaching model of scenario comprehensi... Taking Digital Logic Design,a professional foundation course for undergraduates in the School of Computer Science of Harbin Institute of Technology,as an example,we propose a new teaching model of scenario comprehension and practical progressive teaching in response to the many difficult problems faced in undergraduate teaching,such as the change of the teaching target to first-year university students with zero foundation and low starting point,and the compression of class time,while the quality of the course and the quality of student training have to be improved simultaneously.With the help of MOOC to implement blended teaching,effective means such as lowering the threshold,raising interest,building foundation and progressive improvement are adopted to help freshmen challenge themselves and move to a higher starting point.This paper is a useful exploration of the current new model of high-quality teaching in hardware courses for junior undergraduates. 展开更多
关键词 scene understanding style practice progressive low starting point high drop point blended Learning
下载PDF
Vision Transformers with Hierarchical Attention 被引量:1
11
作者 Yun Liu Yu-Huan Wu +3 位作者 Guolei Sun Le Zhang Ajad Chhatkuli Luc Van Gool 《Machine Intelligence Research》 EI CSCD 2024年第4期670-683,共14页
This paper tackles the high computational/space complexity associated with multi-head self-attention(MHSA)in vanilla vision transformers.To this end,we propose hierarchical MHSA(H-MHSA),a novel approach that computes ... This paper tackles the high computational/space complexity associated with multi-head self-attention(MHSA)in vanilla vision transformers.To this end,we propose hierarchical MHSA(H-MHSA),a novel approach that computes self-attention in a hierarchical fashion.Specifically,we first divide the input image into patches as commonly done,and each patch is viewed as a token.Then,the proposed H-MHSA learns token relationships within local patches,serving as local relationship modeling.Then,the small patches are merged into larger ones,and H-MHSA models the global dependencies for the small number of the merged tokens.At last,the local and global attentive features are aggregated to obtain features with powerful representation capacity.Since we only calculate attention for a limited number of tokens at each step,the computational load is reduced dramatically.Hence,H-MHSA can efficiently model global relationships among tokens without sacrificing fine-grained information.With the H-MHSA module incorporated,we build a family of hierarchical-attention-based transformer networks,namely HAT-Net.To demonstrate the superiority of HAT-Net in scene understanding,we conduct extensive experiments on fundamental vision tasks,including image classification,semantic segmentation,object detection and instance segmentation.Therefore,HAT-Net provides a new perspective for vision transformers.Code and pretrained models are available at https://github.com/yun-liu/HAT-Net. 展开更多
关键词 Vision transformer hierarchical attention global attention local attention scene understanding.
原文传递
Hybrid-augmented intelligence: collaboration and cognition 被引量:65
12
作者 Nan-ning ZHENG Zi-yi LIU +6 位作者 Peng-ju REN Yong-qiang MA Shi-tao CHEN Si-yu YU Jian-ru XUE Ba-dong CHEN Fei-yue WANG 《Frontiers of Information Technology & Electronic Engineering》 SCIE EI CSCD 2017年第2期153-179,共27页
The long-term goal of artificial intelligence (AI) is to make machines learn and think like human beings. Due to the high levels of uncertainty and vulnerability in human life and the open-ended nature of problems t... The long-term goal of artificial intelligence (AI) is to make machines learn and think like human beings. Due to the high levels of uncertainty and vulnerability in human life and the open-ended nature of problems that humans are facing, no matter how intelligent machines are, they are unable to completely replace humans. Therefore, it is necessary to introduce human cognitive capabilities or human-like cognitive models into AI systems to develop a new form of AI, that is, hybrid-augmented intelligence. This form of AI or machine intelligence is a feasible and important developing model. Hybrid-augmented intelligence can be divided into two basic models: one is human-in-the-loop augmented intelligence with human-computer collaboration, and the other is cognitive computing based augmented intelligence, in which a cognitive model is embedded in the machine learning system. This survey describes a basic framework for human-computer collaborative hybrid-augmented intelligence, and the basic elements of hybrid-augmented intelligence based on cognitive computing. These elements include intuitive reasoning, causal models, evolution of memory and knowledge, especially the role and basic principles of intuitive reasoning for complex problem solving, and the cognitive learning framework for visual scene understanding based on memory and reasoning. Several typical applications of hybrid-augmented intelligence in related fields are given. 展开更多
关键词 Human-machine collaboration Hybrid-augmented intelligence Cognitive computing Intuitivereasoning Causal model Cognitive mapping Visual scene understanding Self-driving cars
原文传递
Intelligent Visual Media Processing: When Graphics Meets Vision 被引量:12
13
作者 Ming-Ming Cheng Qi-Bin Hou +1 位作者 Song-Hai Zhang Paul L. Rosin 《Journal of Computer Science & Technology》 SCIE EI CSCD 2017年第1期110-121,共12页
The computer graphics and computer vision communities have been working closely together in recent years and a variety of algorithms and applications have been developed to analyze and manipulate the visual media arou... The computer graphics and computer vision communities have been working closely together in recent years and a variety of algorithms and applications have been developed to analyze and manipulate the visual media around us. There are three major driving forces behind this phenomenon: 1) the availability of big data from the Internet has created a demand for dealing with the ever-increasing, vast amount of resources; 2) powerful processing tools, such as deep neural networks, provide effective ways for learning how to deal with heterogeneous visual data; 3) new data capture devices, such as the Kilxect, the bridge betweea algorithms for 2D image understanding and 3D model analysis. These driving forces have emerged only recently, and we believe that the computer graphics and computer vision communities are still in the beginning of their honeymoon phase. In this work we survey recent research on how computer vision techniques benefit computer graphics techniques and vice versa, and cover research on analysis, manipulation, synthesis, and interaction. We also discuss existing problems and suggest possible further research directions. 展开更多
关键词 computer graphics computer vision SURVEY scene understanding image manipulation
原文传递
ARM3D:Attention-based relation module for indoor 3D object detection 被引量:4
14
作者 Yuqing Lan Yao Duan +4 位作者 Chenyi Liu Chenyang Zhu Yueshan Xiong Hui Huang Kai Xu 《Computational Visual Media》 SCIE EI CSCD 2022年第3期395-414,共20页
Relation contexts have been proved to be useful for many challenging vision tasks.In the field of3D object detection,previous methods have been taking the advantage of context encoding,graph embedding,or explicit rela... Relation contexts have been proved to be useful for many challenging vision tasks.In the field of3D object detection,previous methods have been taking the advantage of context encoding,graph embedding,or explicit relation reasoning to extract relation contexts.However,there exist inevitably redundant relation contexts due to noisy or low-quality proposals.In fact,invalid relation contexts usually indicate underlying scene misunderstanding and ambiguity,which may,on the contrary,reduce the performance in complex scenes.Inspired by recent attention mechanism like Transformer,we propose a novel 3D attention-based relation module(ARM3D).It encompasses objectaware relation reasoning to extract pair-wise relation contexts among qualified proposals and an attention module to distribute attention weights towards different relation contexts.In this way,ARM3D can take full advantage of the useful relation contexts and filter those less relevant or even confusing contexts,which mitigates the ambiguity in detection.We have evaluated the effectiveness of ARM3D by plugging it into several state-of-the-art 3D object detectors and showing more accurate and robust detection results.Extensive experiments show the capability and generalization of ARM3D on 3D object detection.Our source code is available at https://github.com/lanlan96/ARM3D. 展开更多
关键词 attention mechanism scene understanding relational reasoning 3D indoor object detection
原文传递
A Comprehensive Review of Group Activity Recognition in Videos 被引量:2
15
作者 Li-Fang Wu Qi Wang +2 位作者 Meng Jian Yu Qiao Bo-Xuan Zhao 《International Journal of Automation and computing》 EI CSCD 2021年第3期334-350,共17页
Human group activity recognition(GAR)has attracted significant attention from computer vision researchers due to its wide practical applications in security surveillance,social role understanding and sports video anal... Human group activity recognition(GAR)has attracted significant attention from computer vision researchers due to its wide practical applications in security surveillance,social role understanding and sports video analysis.In this paper,we give a comprehensive overview of the advances in group activity recognition in videos during the past 20 years.First,we provide a summary and comparison of 11 GAR video datasets in this field.Second,we survey the group activity recognition methods,including those based on handcrafted features and those based on deep learning networks.For better understanding of the pros and cons of these methods,we compare various models from the past to the present.Finally,we outline several challenging issues and possible directions for future research.From this comprehensive literature review,readers can obtain an overview of progress in group activity recognition for future studies. 展开更多
关键词 Group activity recognition(GAR) human activity recognition scene understanding video analysis computer vision
原文传递
An image-based approach to the reconstruction of ancient architectures by extracting and arranging 3D spatial components 被引量:2
16
作者 Divya Udayan J Hyung Seok KIM Jee-In KIM 《Frontiers of Information Technology & Electronic Engineering》 SCIE EI CSCD 2015年第1期12-27,共16页
The objective of this research is the rapid reconstruction of ancient buildings of historical importance using a single image. The key idea of our approach is to reduce the infinite solutions that might otherwise aris... The objective of this research is the rapid reconstruction of ancient buildings of historical importance using a single image. The key idea of our approach is to reduce the infinite solutions that might otherwise arise when recovering a 3D geometry from 2D photographs. The main outcome of our research shows that the proposed methodology can be used to reconstruct ancient monuments for use as proxies for digital effects in applications such as tourism, games, and entertainment, which do not require very accurate modeling. In this article, we consider the reconstruction of ancient Mughal architecture including the Taj Mahal. We propose a modeling pipeline that makes an easy reconstruction possible using a single photograph taken from a single view, without the need to create complex point clouds from multiple images or the use of laser scanners. First, an initial model is automatically reconstructed using locally fitted planar primitives along with their boundary polygons and the adjacency relation among parts of the polygons. This approach is faster and more accurate than creating a model from scratch because the initial reconstruction phase provides a set of structural information together with the adjacency relation, which makes it possible to estimate the approximate depth of the entire structural monument. Next, we use manual extrapolation and editing techniques with modeling software to assemble and adjust different 3D components of the model. Thus, this research opens up the opportunity for the present generation to experience remote sites of architectural and cultural importance through virtual worlds and real-time mobile applications. Variations of a recreated 3D monument to represent an amalgam of various cultures are targeted for future work. 展开更多
关键词 Digital reconstruction 3D virtual world 3D spatial components Vision and scene understanding
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部