The SDR-to-HDR translation technique can convert the abundant standard-dynamic-range (SDR) media resources to high-dynamic-range (HDR) ones, which can represent high-contrast scenes, providing more realistic visual ex...The SDR-to-HDR translation technique can convert the abundant standard-dynamic-range (SDR) media resources to high-dynamic-range (HDR) ones, which can represent high-contrast scenes, providing more realistic visual experiences. While recent vision Transformers have achieved promising performance in many low-level vision tasks, there are few works attempting to leverage Transformers for SDR-to-HDR translation. In this paper, we are among the first to investigate the performance of Transformers for SDR-to-HDR translation. We find that directly using the self-attention mechanism may involve artifacts in the results due to the inappropriate way to model long-range dependencies between the low-frequency and high-frequency components. Taking this into account, we advance the self-attention mechanism and present a dual frequency attention (DFA), which leverages the self-attention mechanism to separately encode the low-frequency structural information and high-frequency detail information. Based on the proposed DFA, we further design a multi-scale feature fusion network, named dual frequency Transformer (DFT), for efficient SDR-to-HDR translation. Extensive experiments on the HDRTV1K dataset demonstrate that our DFT can achieve better quantitative and qualitative performance than the recent state-of-the-art methods. The code of our DFT is made publicly available at https://github.com/CS-GangXu/DFT.展开更多
While originally designed for natural language processing tasks,the self-attention mechanism has recently taken various computer vision areas by storm.However,the 2D nature of images brings three challenges for applyi...While originally designed for natural language processing tasks,the self-attention mechanism has recently taken various computer vision areas by storm.However,the 2D nature of images brings three challenges for applying self-attention in computer vision:(1)treating images as 1D sequences neglects their 2D structures;(2)the quadratic complexity is too expensive for high-resolution images;(3)it only captures spatial adaptability but ignores channel adaptability.In this paper,we propose a novel linear attention named large kernel attention(LKA)to enable self-adaptive and long-range correlations in self-attention while avoiding its shortcomings.Furthermore,we present a neural network based on LKA,namely Visual Attention Network(VAN).While extremely simple,VAN achieves comparable results with similar size convolutional neural networks(CNNs)and vision transformers(ViTs)in various tasks,including image classification,object detection,semantic segmentation,panoptic segmentation,pose estimation,etc.For example,VAN-B6 achieves 87.8%accuracy on ImageNet benchmark,and sets new state-of-the-art performance(58.2%PQ)for panoptic segmentation.Besides,VAN-B2 surpasses Swin-T 4%mloU(50.1%vs.46.1%)for semantic segmentation on ADE20K benchmark,2.6%AP(48.8%vs.46.2%)for object detection on COCO dataset.It provides a novel method and a simple yet strong baseline for the community.The code is available at https://github.com/Visual-Attention-Network.展开更多
Recognizing dynamic variations on the ground,especially changes caused by various natural disasters,is critical for assessing the severity of the damage and directing the disaster response.However,current workflows fo...Recognizing dynamic variations on the ground,especially changes caused by various natural disasters,is critical for assessing the severity of the damage and directing the disaster response.However,current workflows for disaster assessment usually require human analysts to observe and identify damaged buildings,which is labor-intensive and unsuitable for large-scale disaster areas.In this paper,we propose a difference-aware attention network(D2ANet)for simultaneous building localization and multi-level change detection from the dual-temporal satellite imagery.Considering the differences in different channels in the features of pre-and post-disaster images,we develop a dual-temporal aggregation module using paired features to excite change-sensitive channels of the features and learn the global change pattern.Since the nature of building damage caused by disasters is diverse in complex environments,we design a difference-attention module to exploit local correlations among the multi-level changes,which improves the ability to identify damage on different scales.Extensive experiments on the large-scale building damage assessment dataset xBD demonstrate that our approach provides new state-of-the-art results.Source code is publicly available at https://github.com/mj129/D2ANet.展开更多
Humans can naturally and effectively find salient regions in complex scenes.Motivated by this observation,attention mechanisms were introduced into computer vision with the aim of imitating this aspect of the human vi...Humans can naturally and effectively find salient regions in complex scenes.Motivated by this observation,attention mechanisms were introduced into computer vision with the aim of imitating this aspect of the human visual system.Such an attention mechanism can be regarded as a dynamic weight adjustment process based on features of the input image.Attention mechanisms have achieved great success in many visual tasks,including image classification,object detection,semantic segmentation,video understanding,image generation,3D vision,multimodal tasks,and self-supervised learning.In this survey,we provide a comprehensive review of various attention mechanisms in computer vision and categorize them according to approach,such as channel attention,spatial attention,temporal attention,and branch attention;a related repository https://github.com/MenghaoG uo/Awesome-Vision-Attentions is dedicated to collecting related work.We also suggest future directions for attention mechanism research.展开更多
Detecting and segmenting salient objects from natural scenes, often referred to as salient object detection, has attracted great interest in computer vision. While many models have been proposed and several applicatio...Detecting and segmenting salient objects from natural scenes, often referred to as salient object detection, has attracted great interest in computer vision. While many models have been proposed and several applications have emerged, a deep understanding of achievements and issues remains lacking. We aim to provide a comprehensive review of recent progress in salient object detection and situate this field among other closely related areas such as generic scene segmentation, object proposal generation, and saliency for fixation prediction. Covering 228 publications, we survey i) roots, key concepts, and tasks, ii) core techniques and main modeling trends, and iii) datasets and evaluation metrics for salient object detection. We also discuss open problems such as evaluation metrics and dataset bias in model performance, and suggest future research directions.展开更多
The computer graphics and computer vision communities have been working closely together in recent years and a variety of algorithms and applications have been developed to analyze and manipulate the visual media arou...The computer graphics and computer vision communities have been working closely together in recent years and a variety of algorithms and applications have been developed to analyze and manipulate the visual media around us. There are three major driving forces behind this phenomenon: 1) the availability of big data from the Internet has created a demand for dealing with the ever-increasing, vast amount of resources; 2) powerful processing tools, such as deep neural networks, provide effective ways for learning how to deal with heterogeneous visual data; 3) new data capture devices, such as the Kilxect, the bridge betweea algorithms for 2D image understanding and 3D model analysis. These driving forces have emerged only recently, and we believe that the computer graphics and computer vision communities are still in the beginning of their honeymoon phase. In this work we survey recent research on how computer vision techniques benefit computer graphics techniques and vice versa, and cover research on analysis, manipulation, synthesis, and interaction. We also discuss existing problems and suggest possible further research directions.展开更多
Training a generic objectness measure to produce object proposals has recently become of significant interest. We observe that generic objects with well-defined closed boundaries can be detected by looking at the norm...Training a generic objectness measure to produce object proposals has recently become of significant interest. We observe that generic objects with well-defined closed boundaries can be detected by looking at the norm of gradients, with a suitable resizing of their corresponding image windows to a small fixed size. Based on this observation and computational reasons, we propose to resize the window to 8 × 8 and use the norm of the gradients as a simple 64 D feature to describe it, for explicitly training a generic objectness measure. We further show how the binarized version of this feature, namely binarized normed gradients(BING), can be used for efficient objectness estimation, which requires only a few atomic operations(e.g., add, bitwise shift, etc.). To improve localization quality of the proposals while maintaining efficiency, we propose a novel fast segmentation method and demonstrate its effectiveness for improving BING's localization performance, when used in multithresholding straddling expansion(MTSE) postprocessing. On the challenging PASCAL VOC2007 dataset, using 1000 proposals per image and intersectionover-union threshold of 0.5, our proposal method achieves a 95.6% object detection rate and 78.6% mean average best overlap in less than 0.005 second per image.展开更多
In this paper, we reconsider the clustering problem for image over-segmentation from a new perspective. We propose a novel search algorithm called"active search" which explicitly considers neighbor continuit...In this paper, we reconsider the clustering problem for image over-segmentation from a new perspective. We propose a novel search algorithm called"active search" which explicitly considers neighbor continuity. Based on this search method, we design a back-and-forth traversal strategy and a joint assignment and update step to speed up the algorithm. Compared to earlier methods, such as simple linear iterative clustering(SLIC) and its variants, which use fixed search regions and perform the assignment and the update steps separately, our novel scheme reduces the number of iterations required for convergence,and also provides better boundaries in the oversegmentation results. Extensive evaluation using the Berkeley segmentation benchmark verifies that our method outperforms competing methods under various evaluation metrics. In particular, our method is fastest,achieving approximately 30 fps for a 481 × 321 image on a single CPU core. To facilitate further research, our code is made publicly available.展开更多
Salient object detection,which simulates human visual perception in locating the most significant object(s)in a scene,has been widely applied to various computer vision tasks.Now,the advent of depth sensors means that...Salient object detection,which simulates human visual perception in locating the most significant object(s)in a scene,has been widely applied to various computer vision tasks.Now,the advent of depth sensors means that depth maps can easily be captured;this additional spatial information can boost the performance of salient object detection.Although various RGB-D based salient object detection models with promising performance have been proposed over the past several years,an in-depth understanding of these models and the challenges in this field remains lacking.In this paper,we provide a comprehensive survey of RGBD based salient object detection models from various perspectives,and review related benchmark datasets in detail.Further,as light fields can also provide depth maps,we review salient object detection models and popular benchmark datasets from this domain too.Moreover,to investigate the ability of existing models to detect salient objects,we have carried out a comprehensive attribute-based evaluation of several representative RGB-D based salient object detection models.Finally,we discuss several challenges and open directions of RGB-D based salient object detection for future research.All collected models,benchmark datasets,datasets constructed for attribute-based evaluation,and related code are publicly available at https://github.com/taozh2017/RGBD-SODsurvey.展开更多
Recent advances in supervised salient object detection modeling has resulted in significant performance improvements on benchmark datasets. However, most of the existing salient object detection models assume that at ...Recent advances in supervised salient object detection modeling has resulted in significant performance improvements on benchmark datasets. However, most of the existing salient object detection models assume that at least one salient object exists in the input image. Such an assumption often leads to less appealing saliency maps on the background images with no salient object at all. Therefore, handling those cases can reduce the false positive rate of a model. In this paper, we propose a supervised learning approach for jointly addressing the salient object detection and existence prediction problems. Given a set of background-only images and images with salient objects, as well as their salient object annotations, we adopt the structural SVM framework and formulate the two problems jointly in a single integrated objective function: saliency labels of superpixels are involved in a classification term conditioned on the salient object existence variable, which in turn depends on both global image and regional saliency features and saliency labels assignments. The loss function also considers both image-level and regionlevel mis-classifications. Extensive evaluation on benchmark datasets validate the effectiveness of our proposed joint approach compared to the baseline and state-of-the-art models.展开更多
Salient object detection remains one of the most important and active research topics in computer vision,with wide-ranging applications to object recognition,scene understanding,image retrieval,context aware image edi...Salient object detection remains one of the most important and active research topics in computer vision,with wide-ranging applications to object recognition,scene understanding,image retrieval,context aware image editing,image compression,etc. Most existing methods directly determine salient objects by exploring various salient object features.Here,we propose a novel graph based ranking method to detect and segment the most salient object in a scene according to its relationship to image border(background) regions,i.e.,the background feature.Firstly,we use regions/super-pixels as graph nodes,which are fully connected to enable both long range and short range relations to be modeled. The relationship of each region to the image border(background) is evaluated in two stages:(i) ranking with hard background queries,and(ii) ranking with soft foreground queries. We experimentally show how this two-stage ranking based salient object detection method is complementary to traditional methods,and that integrated results outperform both. Our method allows the exploitation of intrinsic image structure to achieve high quality salient object determination using a quadratic optimization framework,with a closed form solution which can be easily computed.Extensive method evaluation and comparison using three challenging saliency datasets demonstrate that our method consistently outperforms 10 state-of-theart models by a big margin.展开更多
The Iterative Closest Point (ICP) scheme has been widely used for the registration of surfaces and point clouds.However, when working on depth image sequences where there are large geometric planes with small (or even...The Iterative Closest Point (ICP) scheme has been widely used for the registration of surfaces and point clouds.However, when working on depth image sequences where there are large geometric planes with small (or even without) details,existing ICP algorithms are prone to tangential drifting and erroneous rotational estimations due to input device errors.In this paper, we propose a novel ICP algorithm that aims to overcome such drawbacks, and provides significantly stabler registration estimation for simultaneous localization and mapping (SLAM) tasks on RGB-D camera inputs. In our approach,the tangential drifting and the rotational estimation error are reduced by: 1) updating the conventional Euclidean distance term with the local geometry information, and 2) introducing a new camera stabilization term that prevents improper camera movement in the calculation. Our approach is simple, fast, effective, and is readily integratable with previous ICP algorithms. We test our new method with the TUM RGB-D SLAM dataset on state-of-the-art real-time 3D dense reconstruction platforms, i.e., ElasticFusion and Kintinuous. Experiments show that our new strategy outperforms all previous ones on various RGB-D data sequences under different combinations of registration systems and solutions.展开更多
In this paper, we consider salient instance segmentation. As well as producing bounding boxes,our network also outputs high-quality instance-level segments as initial selections to indicate the regions of interest. Ta...In this paper, we consider salient instance segmentation. As well as producing bounding boxes,our network also outputs high-quality instance-level segments as initial selections to indicate the regions of interest. Taking into account the category-independent property of each target, we design a single stage salient instance segmentation framework, with a novel segmentation branch. Our new branch regards not only local context inside each detection window but also the surrounding context, enabling us to distinguish instances in the same scope even with partial occlusion.Our network is end-to-end trainable and is fast(running at 40 fps for images with resolution 320 × 320). We evaluate our approach on a publicly available benchmark and show that it outperforms alternative solutions. We also provide a thorough analysis of our design choices to help readers better understand the function of each part of our network. Source code can be found at https://github.com/Ruochen Fan/S4 Net.展开更多
The past decade has witnessed the impressive and steady development of single-modal AI technologies in several fields,thanks to the emergence of deep learning.Less studied,however,is multi-modal AI-commonly considered...The past decade has witnessed the impressive and steady development of single-modal AI technologies in several fields,thanks to the emergence of deep learning.Less studied,however,is multi-modal AI-commonly considered the next generation of AI-which utilizes complementary context concealed in different-modality inputs to improve performance.Humans naturally learn to form a global concept from multiple modalities(i.e.,sight,hearing,touch,smell,and taste),even when some are incomplete or missing.Thus,in addition to the two popular modalities(vision and language),other types of data such as depth,infrared information,and events are also important for multi-modal learning in real-world scenes.展开更多
基金supported by National Natural Science Foundation of China(Nos.61922046 and 62276145)the National Key Research and Development Program of China(No.2018AAA0100400)Fundamental Research Funds for Central Universities,China(No.63223049).
文摘The SDR-to-HDR translation technique can convert the abundant standard-dynamic-range (SDR) media resources to high-dynamic-range (HDR) ones, which can represent high-contrast scenes, providing more realistic visual experiences. While recent vision Transformers have achieved promising performance in many low-level vision tasks, there are few works attempting to leverage Transformers for SDR-to-HDR translation. In this paper, we are among the first to investigate the performance of Transformers for SDR-to-HDR translation. We find that directly using the self-attention mechanism may involve artifacts in the results due to the inappropriate way to model long-range dependencies between the low-frequency and high-frequency components. Taking this into account, we advance the self-attention mechanism and present a dual frequency attention (DFA), which leverages the self-attention mechanism to separately encode the low-frequency structural information and high-frequency detail information. Based on the proposed DFA, we further design a multi-scale feature fusion network, named dual frequency Transformer (DFT), for efficient SDR-to-HDR translation. Extensive experiments on the HDRTV1K dataset demonstrate that our DFT can achieve better quantitative and qualitative performance than the recent state-of-the-art methods. The code of our DFT is made publicly available at https://github.com/CS-GangXu/DFT.
基金supported by National Key R&D Program of China(Project No.2021ZD0112902)the National Natural Science Foundation of China(Project No.62220106003)Tsinghua-Tencent Joint Laboratory for Internet Innovation Technology.
文摘While originally designed for natural language processing tasks,the self-attention mechanism has recently taken various computer vision areas by storm.However,the 2D nature of images brings three challenges for applying self-attention in computer vision:(1)treating images as 1D sequences neglects their 2D structures;(2)the quadratic complexity is too expensive for high-resolution images;(3)it only captures spatial adaptability but ignores channel adaptability.In this paper,we propose a novel linear attention named large kernel attention(LKA)to enable self-adaptive and long-range correlations in self-attention while avoiding its shortcomings.Furthermore,we present a neural network based on LKA,namely Visual Attention Network(VAN).While extremely simple,VAN achieves comparable results with similar size convolutional neural networks(CNNs)and vision transformers(ViTs)in various tasks,including image classification,object detection,semantic segmentation,panoptic segmentation,pose estimation,etc.For example,VAN-B6 achieves 87.8%accuracy on ImageNet benchmark,and sets new state-of-the-art performance(58.2%PQ)for panoptic segmentation.Besides,VAN-B2 surpasses Swin-T 4%mloU(50.1%vs.46.1%)for semantic segmentation on ADE20K benchmark,2.6%AP(48.8%vs.46.2%)for object detection on COCO dataset.It provides a novel method and a simple yet strong baseline for the community.The code is available at https://github.com/Visual-Attention-Network.
基金supported by the National Key R&D Program of China(Grant No.2018AAA0100400)Fundamental Research Funds for the Central Universities(Nankai University,Grant No.63223050)National Natural Science Foundation of China(Grant No.62176130).
文摘Recognizing dynamic variations on the ground,especially changes caused by various natural disasters,is critical for assessing the severity of the damage and directing the disaster response.However,current workflows for disaster assessment usually require human analysts to observe and identify damaged buildings,which is labor-intensive and unsuitable for large-scale disaster areas.In this paper,we propose a difference-aware attention network(D2ANet)for simultaneous building localization and multi-level change detection from the dual-temporal satellite imagery.Considering the differences in different channels in the features of pre-and post-disaster images,we develop a dual-temporal aggregation module using paired features to excite change-sensitive channels of the features and learn the global change pattern.Since the nature of building damage caused by disasters is diverse in complex environments,we design a difference-attention module to exploit local correlations among the multi-level changes,which improves the ability to identify damage on different scales.Extensive experiments on the large-scale building damage assessment dataset xBD demonstrate that our approach provides new state-of-the-art results.Source code is publicly available at https://github.com/mj129/D2ANet.
基金National Natural Science Foundation of China(Grant Nos.61521002 and 62132012)。
文摘Humans can naturally and effectively find salient regions in complex scenes.Motivated by this observation,attention mechanisms were introduced into computer vision with the aim of imitating this aspect of the human visual system.Such an attention mechanism can be regarded as a dynamic weight adjustment process based on features of the input image.Attention mechanisms have achieved great success in many visual tasks,including image classification,object detection,semantic segmentation,video understanding,image generation,3D vision,multimodal tasks,and self-supervised learning.In this survey,we provide a comprehensive review of various attention mechanisms in computer vision and categorize them according to approach,such as channel attention,spatial attention,temporal attention,and branch attention;a related repository https://github.com/MenghaoG uo/Awesome-Vision-Attentions is dedicated to collecting related work.We also suggest future directions for attention mechanism research.
文摘Detecting and segmenting salient objects from natural scenes, often referred to as salient object detection, has attracted great interest in computer vision. While many models have been proposed and several applications have emerged, a deep understanding of achievements and issues remains lacking. We aim to provide a comprehensive review of recent progress in salient object detection and situate this field among other closely related areas such as generic scene segmentation, object proposal generation, and saliency for fixation prediction. Covering 228 publications, we survey i) roots, key concepts, and tasks, ii) core techniques and main modeling trends, and iii) datasets and evaluation metrics for salient object detection. We also discuss open problems such as evaluation metrics and dataset bias in model performance, and suggest future research directions.
基金This research was sponsored by the National Natural Science Foundation of China under Grant Nos. 61572264 and 61373069, the National Key Research and Development Plan of China under Grant No. 2016YFB1001402, Huawei Innovation Research Program (HIRP), China Association for Science and Technology (CAST) Young Talents Plan, and Tianjin Short-Term Recruitment Program of Foreign Experts.
文摘The computer graphics and computer vision communities have been working closely together in recent years and a variety of algorithms and applications have been developed to analyze and manipulate the visual media around us. There are three major driving forces behind this phenomenon: 1) the availability of big data from the Internet has created a demand for dealing with the ever-increasing, vast amount of resources; 2) powerful processing tools, such as deep neural networks, provide effective ways for learning how to deal with heterogeneous visual data; 3) new data capture devices, such as the Kilxect, the bridge betweea algorithms for 2D image understanding and 3D model analysis. These driving forces have emerged only recently, and we believe that the computer graphics and computer vision communities are still in the beginning of their honeymoon phase. In this work we survey recent research on how computer vision techniques benefit computer graphics techniques and vice versa, and cover research on analysis, manipulation, synthesis, and interaction. We also discuss existing problems and suggest possible further research directions.
基金supported by the National Natural Science Foundation of China (Nos. 61572264, 61620106008)
文摘Training a generic objectness measure to produce object proposals has recently become of significant interest. We observe that generic objects with well-defined closed boundaries can be detected by looking at the norm of gradients, with a suitable resizing of their corresponding image windows to a small fixed size. Based on this observation and computational reasons, we propose to resize the window to 8 × 8 and use the norm of the gradients as a simple 64 D feature to describe it, for explicitly training a generic objectness measure. We further show how the binarized version of this feature, namely binarized normed gradients(BING), can be used for efficient objectness estimation, which requires only a few atomic operations(e.g., add, bitwise shift, etc.). To improve localization quality of the proposals while maintaining efficiency, we propose a novel fast segmentation method and demonstrate its effectiveness for improving BING's localization performance, when used in multithresholding straddling expansion(MTSE) postprocessing. On the challenging PASCAL VOC2007 dataset, using 1000 proposals per image and intersectionover-union threshold of 0.5, our proposal method achieves a 95.6% object detection rate and 78.6% mean average best overlap in less than 0.005 second per image.
基金sponsored by National Natural Science Foundation of China (Nos. 61620106008 and 61572264)Huawei Innovation Research Program (HIRP)
文摘In this paper, we reconsider the clustering problem for image over-segmentation from a new perspective. We propose a novel search algorithm called"active search" which explicitly considers neighbor continuity. Based on this search method, we design a back-and-forth traversal strategy and a joint assignment and update step to speed up the algorithm. Compared to earlier methods, such as simple linear iterative clustering(SLIC) and its variants, which use fixed search regions and perform the assignment and the update steps separately, our novel scheme reduces the number of iterations required for convergence,and also provides better boundaries in the oversegmentation results. Extensive evaluation using the Berkeley segmentation benchmark verifies that our method outperforms competing methods under various evaluation metrics. In particular, our method is fastest,achieving approximately 30 fps for a 481 × 321 image on a single CPU core. To facilitate further research, our code is made publicly available.
基金supported by a Major Project for a New Generation of AI under Grant No.2018AAA0100400National Natural Science Foundation of China(61922046)Tianjin Natural Science Foundation(17JCJQJC43700)。
文摘Salient object detection,which simulates human visual perception in locating the most significant object(s)in a scene,has been widely applied to various computer vision tasks.Now,the advent of depth sensors means that depth maps can easily be captured;this additional spatial information can boost the performance of salient object detection.Although various RGB-D based salient object detection models with promising performance have been proposed over the past several years,an in-depth understanding of these models and the challenges in this field remains lacking.In this paper,we provide a comprehensive survey of RGBD based salient object detection models from various perspectives,and review related benchmark datasets in detail.Further,as light fields can also provide depth maps,we review salient object detection models and popular benchmark datasets from this domain too.Moreover,to investigate the ability of existing models to detect salient objects,we have carried out a comprehensive attribute-based evaluation of several representative RGB-D based salient object detection models.Finally,we discuss several challenges and open directions of RGB-D based salient object detection for future research.All collected models,benchmark datasets,datasets constructed for attribute-based evaluation,and related code are publicly available at https://github.com/taozh2017/RGBD-SODsurvey.
基金the National Natural Science Foundation of China(Grant Nos.61572264,61620106008)CAST young talents plan.
文摘Recent advances in supervised salient object detection modeling has resulted in significant performance improvements on benchmark datasets. However, most of the existing salient object detection models assume that at least one salient object exists in the input image. Such an assumption often leads to less appealing saliency maps on the background images with no salient object at all. Therefore, handling those cases can reduce the false positive rate of a model. In this paper, we propose a supervised learning approach for jointly addressing the salient object detection and existence prediction problems. Given a set of background-only images and images with salient objects, as well as their salient object annotations, we adopt the structural SVM framework and formulate the two problems jointly in a single integrated objective function: saliency labels of superpixels are involved in a classification term conditioned on the salient object existence variable, which in turn depends on both global image and regional saliency features and saliency labels assignments. The loss function also considers both image-level and regionlevel mis-classifications. Extensive evaluation on benchmark datasets validate the effectiveness of our proposed joint approach compared to the baseline and state-of-the-art models.
基金funded by the National Natural Science Foundation of China under project No.61231014 and No.61572264,respectivelysupported by Defense Advanced Research Projects Agency (No.HR001110-C-0034)+1 种基金the National Science Foundation (No.BCS-0827764)the Army Research Office (No.W911NF-08-1-0360)
文摘Salient object detection remains one of the most important and active research topics in computer vision,with wide-ranging applications to object recognition,scene understanding,image retrieval,context aware image editing,image compression,etc. Most existing methods directly determine salient objects by exploring various salient object features.Here,we propose a novel graph based ranking method to detect and segment the most salient object in a scene according to its relationship to image border(background) regions,i.e.,the background feature.Firstly,we use regions/super-pixels as graph nodes,which are fully connected to enable both long range and short range relations to be modeled. The relationship of each region to the image border(background) is evaluated in two stages:(i) ranking with hard background queries,and(ii) ranking with soft foreground queries. We experimentally show how this two-stage ranking based salient object detection method is complementary to traditional methods,and that integrated results outperform both. Our method allows the exploitation of intrinsic image structure to achieve high quality salient object determination using a quadratic optimization framework,with a closed form solution which can be easily computed.Extensive method evaluation and comparison using three challenging saliency datasets demonstrate that our method consistently outperforms 10 state-of-theart models by a big margin.
基金Tianjin Natural Science Foundation of China under Grant Nos.18JCYBJC41300 and 18ZXZNGX00110the National Natural Science Foundation of China under Grant No.61620106008.
文摘The Iterative Closest Point (ICP) scheme has been widely used for the registration of surfaces and point clouds.However, when working on depth image sequences where there are large geometric planes with small (or even without) details,existing ICP algorithms are prone to tangential drifting and erroneous rotational estimations due to input device errors.In this paper, we propose a novel ICP algorithm that aims to overcome such drawbacks, and provides significantly stabler registration estimation for simultaneous localization and mapping (SLAM) tasks on RGB-D camera inputs. In our approach,the tangential drifting and the rotational estimation error are reduced by: 1) updating the conventional Euclidean distance term with the local geometry information, and 2) introducing a new camera stabilization term that prevents improper camera movement in the calculation. Our approach is simple, fast, effective, and is readily integratable with previous ICP algorithms. We test our new method with the TUM RGB-D SLAM dataset on state-of-the-art real-time 3D dense reconstruction platforms, i.e., ElasticFusion and Kintinuous. Experiments show that our new strategy outperforms all previous ones on various RGB-D data sequences under different combinations of registration systems and solutions.
基金supported by National Natural Science Foundation of China(61521002,61572264,61620106008)the National Youth Talent Support Program+1 种基金Tianjin Natural Science Foundation(17JCJQJC43700,18ZXZNGX00110)the Fundamental Research Funds for the Central Universities(Nankai University,No.63191501)。
文摘In this paper, we consider salient instance segmentation. As well as producing bounding boxes,our network also outputs high-quality instance-level segments as initial selections to indicate the regions of interest. Taking into account the category-independent property of each target, we design a single stage salient instance segmentation framework, with a novel segmentation branch. Our new branch regards not only local context inside each detection window but also the surrounding context, enabling us to distinguish instances in the same scope even with partial occlusion.Our network is end-to-end trainable and is fast(running at 40 fps for images with resolution 320 × 320). We evaluate our approach on a publicly available benchmark and show that it outperforms alternative solutions. We also provide a thorough analysis of our design choices to help readers better understand the function of each part of our network. Source code can be found at https://github.com/Ruochen Fan/S4 Net.
文摘The past decade has witnessed the impressive and steady development of single-modal AI technologies in several fields,thanks to the emergence of deep learning.Less studied,however,is multi-modal AI-commonly considered the next generation of AI-which utilizes complementary context concealed in different-modality inputs to improve performance.Humans naturally learn to form a global concept from multiple modalities(i.e.,sight,hearing,touch,smell,and taste),even when some are incomplete or missing.Thus,in addition to the two popular modalities(vision and language),other types of data such as depth,infrared information,and events are also important for multi-modal learning in real-world scenes.