Recently,video object segmentation has received great attention in the computer vision community.Most of the existing methods heavily rely on the pixel-wise human annotations,which are expensive and time-consuming to ...Recently,video object segmentation has received great attention in the computer vision community.Most of the existing methods heavily rely on the pixel-wise human annotations,which are expensive and time-consuming to obtain.To tackle this problem,we make an early attempt to achieve video object segmentation with scribble-level supervision,which can alleviate large amounts of human labor for collecting the manual annotation.However,using conventional network architectures and learning objective functions under this scenario cannot work well as the supervision information is highly sparse and incomplete.To address this issue,this paper introduces two novel elements to learn the video object segmentation model.The first one is the scribble attention module,which captures more accurate context information and learns an effective attention map to enhance the contrast between foreground and background.The other one is the scribble-supervised loss,which can optimize the unlabeled pixels and dynamically correct inaccurate segmented areas during the training stage.To evaluate the proposed method,we implement experiments on two video object segmentation benchmark datasets,You Tube-video object segmentation(VOS),and densely annotated video segmentation(DAVIS)-2017.We first generate the scribble annotations from the original per-pixel annotations.Then,we train our model and compare its test performance with the baseline models and other existing works.Extensive experiments demonstrate that the proposed method can work effectively and approach to the methods requiring the dense per-pixel annotations.展开更多
In this paper an efficient compressed domain moving object segmentation algorithm is proposed, in which the motion vector (MV) field parsed from the compressed video is the only cue used for moving object segmentati...In this paper an efficient compressed domain moving object segmentation algorithm is proposed, in which the motion vector (MV) field parsed from the compressed video is the only cue used for moving object segmentation. First the MV field is temporally and spatially normalized, and then accumulated by an iterative backward projection to enhance salient motions and alleviate noisy MVs. The accumulated MV field is then segmented into motion-homogenous regions using a modified statistical region growing approach. Finally, moving object regions are extracted in turn based on minimization of the joint prediction error using the estimated motion models of two region sets containing the candidate object region and other remaining regions, respectively. Experimental results on several H.264 compressed video sequences demonstrate good segmentation performance.展开更多
This paper concerns the problem of object segmentation in real-time for picking system. A region proposal method inspired by human glance based on the convolutional neural network is proposed to select promising regio...This paper concerns the problem of object segmentation in real-time for picking system. A region proposal method inspired by human glance based on the convolutional neural network is proposed to select promising regions, allowing more processing is reserved only for these regions. The speed of object segmentation is significantly improved by the region proposal method.By the combination of the region proposal method based on the convolutional neural network and superpixel method, the category and location information can be used to segment objects and image redundancy is significantly reduced. The processing time is reduced considerably by this to achieve the real time. Experiments show that the proposed method can segment the interested target object in real time on an ordinary laptop.展开更多
Small storage space for photographs in formal documents is increasingly necessary in today's needs for huge amounts of data communication and storage. Traditional compression algorithms do not sufficiently utilize th...Small storage space for photographs in formal documents is increasingly necessary in today's needs for huge amounts of data communication and storage. Traditional compression algorithms do not sufficiently utilize the distinctness of formal photographs. That is, the object is an image of the human head, and the background is in unicolor. Therefore, the compression is of low efficiency and the image after compression is still space-consuming. This paper presents an image compression algorithm based on object segmentation for practical high-efficiency applications. To achieve high coding efficiency, shape-adaptive discrete wavelet transforms are used to transformation arbitrarily shaped objects. The areas of the human head and its background are compressed separately to reduce the coding redundancy of the background. Two methods, lossless image contour coding based on differential chain, and modified set partitioning in hierarchical trees (SPIHT) algorithm of arbitrary shape, are discussed in detail. The results of experiments show that when bit per pixel (bpp)is equal to 0.078, peak signal-to-noise ratio (PSNR) of reconstructed photograph will exceed the standard of SPIHT by nearly 4dB.展开更多
With the development of the modern information society, more and more multimedia information is available. So the technology of multimedia processing is becoming the important task for the irrelevant area of scientist...With the development of the modern information society, more and more multimedia information is available. So the technology of multimedia processing is becoming the important task for the irrelevant area of scientist. Among of the multimedia, the visual informarion is more attractive due to its direct, vivid characteristic, but at the same rime the huge amount of video data causes many challenges if the video storage, processing and transmission.展开更多
Current mainstream unsupervised video object segmentation(UVOS) approaches typically incorporate optical flow as motion information to locate the primary objects in coherent video frames. However, they fuse appearance...Current mainstream unsupervised video object segmentation(UVOS) approaches typically incorporate optical flow as motion information to locate the primary objects in coherent video frames. However, they fuse appearance and motion information without evaluating the quality of the optical flow. When poor-quality optical flow is used for the interaction with the appearance information, it introduces significant noise and leads to a decline in overall performance. To alleviate this issue, we first employ a quality evaluation module(QEM) to evaluate the optical flow. Then, we select high-quality optical flow as motion cues to fuse with the appearance information, which can prevent poor-quality optical flow from diverting the network's attention. Moreover, we design an appearance-guided fusion module(AGFM) to better integrate appearance and motion information. Extensive experiments on several widely utilized datasets, including DAVIS-16, FBMS-59, and You Tube-Objects, demonstrate that the proposed method outperforms existing methods.展开更多
Moving object segmentation(MOS),aiming at segmenting moving objects from video frames,is an important and challenging task in computer vision and with various applications.With the development of deep learning(DL),MOS...Moving object segmentation(MOS),aiming at segmenting moving objects from video frames,is an important and challenging task in computer vision and with various applications.With the development of deep learning(DL),MOS has also entered the era of deep models toward spatiotemporal feature learning.This paper aims to provide the latest review of recent DL-based MOS methods proposed during the past three years.Specifically,we present a more up-to-date categorization based on model characteristics,then compare and discuss each category from feature learning(FL),and model training and evaluation perspectives.For FL,the methods reviewed are divided into three types:spatial FL,temporal FL,and spatiotemporal FL,then analyzed from input and model architectures aspects,three input types,and four typical preprocessing subnetworks are summarized.In terms of training,we discuss ideas for enhancing model transferability.In terms of evaluation,based on a previous categorization of scene dependent evaluation and scene independent evaluation,and combined with whether used videos are recorded with static or moving cameras,we further provide four subdivided evaluation setups and analyze that of reviewed methods.We also show performance comparisons of some reviewed MOS methods and analyze the advantages and disadvantages of reviewed MOS methods in terms of technology.Finally,based on the above comparisons and discussions,we present research prospects and future directions.展开更多
Previous video object segmentation approachesmainly focus on simplex solutions linking appearance and motion,limiting effective feature collaboration between these two cues.In this work,we study a novel and efficient ...Previous video object segmentation approachesmainly focus on simplex solutions linking appearance and motion,limiting effective feature collaboration between these two cues.In this work,we study a novel and efficient full-duplex strategy network(FSNet)to address this issue,by considering a better mutual restraint scheme linking motion and appearance allowing exploitation of cross-modal features from the fusion and decoding stage.Specifically,we introduce a relational cross-attention module(RCAM)to achieve bidirectional message propagation across embedding sub-spaces.To improve the model’s robustness and update inconsistent features from the spatiotemporal embeddings,we adopt a bidirectional purification module after the RCAM.Extensive experiments on five popular benchmarks show that our FSNet is robust to various challenging scenarios(e.g.,motion blur and occlusion),and compares well to leading methods both for video object segmentation and video salient object detection.The project is publicly available at https://github.com/GewelsJI/FSNet.展开更多
We present a lightweight and efficient semisupervised video object segmentation network based on the space-time memory framework.To some extent,our method solves the two difficulties encountered in traditional video o...We present a lightweight and efficient semisupervised video object segmentation network based on the space-time memory framework.To some extent,our method solves the two difficulties encountered in traditional video object segmentation:one is that the single frame calculation time is too long,and the other is that the current frame’s segmentation should use more information from past frames.The algorithm uses a global context(GC)module to achieve highperformance,real-time segmentation.The GC module can effectively integrate multi-frame image information without increased memory and can process each frame in real time.Moreover,the prediction mask of the previous frame is helpful for the segmentation of the current frame,so we input it into a spatial constraint module(SCM),which constrains the areas of segments in the current frame.The SCM effectively alleviates mismatching of similar targets yet consumes few additional resources.We added a refinement module to the decoder to improve boundary segmentation.Our model achieves state-of-the-art results on various datasets,scoring 80.1%on YouTube-VOS 2018 and a J&F score of 78.0%on DAVIS 2017,while taking 0.05 s per frame on the DAVIS 2016 validation dataset.展开更多
Cancer is one of the leading causes of death in the world,with radiotherapy as one of the treatment options.Radiotherapy planning starts with delineating the affected area from healthy organs,called organs at risk(OAR...Cancer is one of the leading causes of death in the world,with radiotherapy as one of the treatment options.Radiotherapy planning starts with delineating the affected area from healthy organs,called organs at risk(OAR).A new approach to automatic OAR seg-mentation in the chest cavity in Computed Tomography(CT)images is presented.The proposed approach is based on the modified U‐Net architecture with the ResNet‐34 encoder,which is the baseline adopted in this work.The new two‐branch CS‐SA U‐Net architecture is proposed,which consists of two parallel U‐Net models in which self‐attention blocks with cosine similarity as query‐key similarity function(CS‐SA)blocks are inserted between the encoder and decoder,which enabled the use of con-sistency regularisation.The proposed solution demonstrates state‐of‐the‐art performance for the problem of OAR segmentation in CT images on the publicly available SegTHOR benchmark dataset in terms of a Dice coefficient(oesophagus-0.8714,heart-0.9516,trachea-0.9286,aorta-0.9510)and Hausdorff distance(oesophagus-0.2541,heart-0.1514,trachea-0.1722,aorta-0.1114)and significantly outperforms the baseline.The current approach is demonstrated to be viable for improving the quality of OAR segmentation for radiotherapy planning.展开更多
Moving object segmentation is one of the most challenging issues in computer vision. In this paper, we propose a new algorithm for static camera foreground segmentation. It combines Gaussian mixture model (GMM) and ...Moving object segmentation is one of the most challenging issues in computer vision. In this paper, we propose a new algorithm for static camera foreground segmentation. It combines Gaussian mixture model (GMM) and active contours method, and produces much better results than conventional background subtraction methods. It formulates foreground segmentation as an energy minimization problem and minimizes the energy function using curve evolution method. Our algorithm integrates the GMM background model, shadow elimination term and curve evolution edge stopping term into energy function. It achieves more accurate segmentation than existing methods of the same type. Promising results on real images demonstrate the potential of the presented method.展开更多
While the development of particular video segmentation algorithms has attracted considerable research interest, relatively little effort has been devoted to provide a methodology for evaluating their performance. In t...While the development of particular video segmentation algorithms has attracted considerable research interest, relatively little effort has been devoted to provide a methodology for evaluating their performance. In this paper, we propose a methodology to objectively evaluate video segmentation algorithm with ground-truth, which is based on computing the deviation of segmentation results from the reference segmentation. Four different metrics based on classification pixels, edges, relative foreground area and relative position respectively are combined to address the spatial accuracy. Temporal coherency is evaluated by utilizing the difference of spatial accuracy between successive frames. The experimental results show the feasibility of our approach. Moreover, it is computationally more efficient than previous methods. It can be applied to provide an offline ranking among different segmentation algorithms and to optimally set the parameters for a given algorithm.展开更多
A novel moving objects segmentation method is proposed in this paper. A modified three dimensional recursive search (3DRS) algorithm is used in order to obtain motion information accurately. A motion feature descrip...A novel moving objects segmentation method is proposed in this paper. A modified three dimensional recursive search (3DRS) algorithm is used in order to obtain motion information accurately. A motion feature descriptor (MFD) is designed to describe motion feature of each block in a picture based on motion intensity, motion in occlusion areas, and motion correlation among neighbouring blocks. Then, a fuzzy C-means clustering algorithm (FCM) is implemented based on those MFDs so as to segment moving objects. Moreover, a new parameter named as gathering degree is used to distinguish foreground moving objects and background motion. Experimental results demonstrate the effectiveness of the proposed method.展开更多
3D object recognition is a challenging task for intelligent and robot systems in industrial and home indoor environments.It is critical for such systems to recognize and segment the 3D object instances that they encou...3D object recognition is a challenging task for intelligent and robot systems in industrial and home indoor environments.It is critical for such systems to recognize and segment the 3D object instances that they encounter on a frequent basis.The computer vision,graphics,and machine learning fields have all given it a lot of attention.Traditionally,3D segmentation was done with hand-crafted features and designed approaches that didn’t achieve acceptable performance and couldn’t be generalized to large-scale data.Deep learning approaches have lately become the preferred method for 3D segmentation challenges by their great success in 2D computer vision.However,the task of instance segmentation is currently less explored.In this paper,we propose a novel approach for efficient 3D instance segmentation using red green blue and depth(RGB-D)data based on deep learning.The 2D region based convolutional neural networks(Mask R-CNN)deep learning model with point based rending module is adapted to integrate with depth information to recognize and segment 3D instances of objects.In order to generate 3D point cloud coordinates(x,y,z),segmented 2D pixels(u,v)of recognized object regions in the RGB image are merged into(u,v)points of the depth image.Moreover,we conducted an experiment and analysis to compare our proposed method from various points of view and distances.The experimentation shows the proposed 3D object recognition and instance segmentation are sufficiently beneficial to support object handling in robotic and intelligent systems.展开更多
A fast interactive segmentation algorithm of image-sequences based on relative fuzzy connectedness is presented. In comparison with the original algorithm, the proposed one, with the same accuracy, accelerates the seg...A fast interactive segmentation algorithm of image-sequences based on relative fuzzy connectedness is presented. In comparison with the original algorithm, the proposed one, with the same accuracy, accelerates the segmentation speed by three times for single image. Meanwhile, this fast segmentation algorithm is extended from single object to multiple objects and from single-image to image-sequences. Thus the segmentation of multiple objects from complex hackground and batch segmentation of image-sequences can be achieved. In addition, a post-processing scheme is incorporated in this algorithm, which extracts smooth edge with one-pixel-width for each segmented object. The experimental results illustrate that the proposed algorithm can obtain the object regions of interest from medical image or image-sequences as well as man-made images quickly and reliably with only a little interaction.展开更多
Undeniably,Deep Learning(DL)has rapidly eroded traditional machine learning in Remote Sensing(RS)and geoscience domains with applications such as scene understanding,material identification,extreme weather detection,o...Undeniably,Deep Learning(DL)has rapidly eroded traditional machine learning in Remote Sensing(RS)and geoscience domains with applications such as scene understanding,material identification,extreme weather detection,oil spill identification,among many others.Traditional machine learning algorithms are given less and less attention in the era of big data.Recently,a substantial amount of work aimed at developing image classification approaches based on the DL model’s success in computer vision.The number of relevant articles has nearly doubled every year since 2015.Advances in remote sensing technology,as well as the rapidly expanding volume of publicly available satellite imagery on a worldwide scale,have opened up the possibilities for a wide range of modern applications.However,there are some challenges related to the availability of annotated data,the complex nature of data,and model parameterization,which strongly impact performance.In this article,a comprehensive review of the literature encompassing a broad spectrum of pioneer work in remote sensing image classification is presented including network architectures(vintage Convolutional Neural Network,CNN;Fully Convolutional Networks,FCN;encoder-decoder,recurrent networks;attention models,and generative adversarial models).The characteristics,capabilities,and limitations of current DL models were examined,and potential research directions were discussed.展开更多
Computer vision systems have an impressive spread both for their practicalapplication and for theoretical research . The common approach used in such systems consists of agood segmentation of moving objects from video...Computer vision systems have an impressive spread both for their practicalapplication and for theoretical research . The common approach used in such systems consists of agood segmentation of moving objects from video sequences . This paper presents an automaticalgorithm for segmenting and extracting moving objects suitable for indoor and outdoor videoapplications, where the background scene can be captured beforehand . Since edge detection is oftenused to extract accurate boundaries of the image's objects, the first step in our algorithm isaccomplished by combining two edge maps which are detected from the frame difference in twoconsecutive frames and the background subtraction . After removing edge points that belong to thebackground, the resulting moving edge map is fed to the object extraction step . A fundamental taskin this step is to declare the candidates of the moving object, followed by applying morphologicaloperations. The algorithm is implemented on a real video sequence as well as MPEG- 4 sequence andgood segmentation results are achieved.展开更多
Interactive image segmentation(IIS)is an important technique for obtaining pixel-level annotations.In many cases,target objects share similar semantics.However,IIS methods neglect this connection and in particular the...Interactive image segmentation(IIS)is an important technique for obtaining pixel-level annotations.In many cases,target objects share similar semantics.However,IIS methods neglect this connection and in particular the cues provided by representations of previously segmented objects,previous user interaction,and previous prediction masks,which can all provide suitable priors for the current annotation.In this paper,we formulate a sequential interactive image segmentation(SIIS)task for minimizing user interaction when segmenting sequences of related images,and we provide a practical approach to this task using two pertinent designs.The first is a novel interaction mode.When annotating a new sample,our method can automatically propose an initial click proposal based on previous annotation.This dramatically helps to reduce the interaction burden on the user.The second is an online optimization strategy,with the goal of providing semantic information when annotating specific targets,optimizing the model with dense supervision from previously labeled samples.Experiments demonstrate the effectiveness of regarding SIIS as a particular task,and our methods for addressing it.展开更多
In this paper we present a simultaneous segmentation algorithm for multiple highly-occluded objects, which combines high-level knowledge and low-level information in a unified framework. The high-level knowledge provi...In this paper we present a simultaneous segmentation algorithm for multiple highly-occluded objects, which combines high-level knowledge and low-level information in a unified framework. The high-level knowledge provides sophis- ticated shape priors with the consideration of blocking relationship between nearby objects. Different from conventional layered model which attempts to solve the full ordering problem, we decompose the problem into a series of pairwise ones and this makes our algorithm scalable to a large number of objects. Objects are segmented in pixel level with higher-order soft constraints from superpixels, by a dual-level conditional random field. The model is optimized alternately by object layout and pixel-wise segmentation. V^e evaluate our system on different objects, i.e., clothing and pedestrian, and show impressive segmentation results and significant improvement over state-of-the-art segmentation algorithms.展开更多
Moving object segmentation (MOS) is one of the essential functions of the vision system of all robots,including medical robots. Deep learning-based MOS methods, especially deep end-to-end MOS methods, are actively inv...Moving object segmentation (MOS) is one of the essential functions of the vision system of all robots,including medical robots. Deep learning-based MOS methods, especially deep end-to-end MOS methods, are actively investigated in this field. Foreground segmentation networks (FgSegNets) are representative deep end-to-endMOS methods proposed recently. This study explores a new mechanism to improve the spatial feature learningcapability of FgSegNets with relatively few brought parameters. Specifically, we propose an enhanced attention(EA) module, a parallel connection of an attention module and a lightweight enhancement module, with sequentialattention and residual attention as special cases. We also propose integrating EA with FgSegNet_v2 by taking thelightweight convolutional block attention module as the attention module and plugging EA module after the twoMaxpooling layers of the encoder. The derived new model is named FgSegNet_v2 EA. The ablation study verifiesthe effectiveness of the proposed EA module and integration strategy. The results on the CDnet2014 dataset,which depicts human activities and vehicles captured in different scenes, show that FgSegNet_v2 EA outperformsFgSegNet_v2 by 0.08% and 14.5% under the settings of scene dependent evaluation and scene independent evaluation, respectively, which indicates the positive effect of EA on improving spatial feature learning capability ofFgSegNet_v2.展开更多
基金supported in part by the National Key R&D Program of China(2017YFB0502904)the National Science Foundation of China(61876140)。
文摘Recently,video object segmentation has received great attention in the computer vision community.Most of the existing methods heavily rely on the pixel-wise human annotations,which are expensive and time-consuming to obtain.To tackle this problem,we make an early attempt to achieve video object segmentation with scribble-level supervision,which can alleviate large amounts of human labor for collecting the manual annotation.However,using conventional network architectures and learning objective functions under this scenario cannot work well as the supervision information is highly sparse and incomplete.To address this issue,this paper introduces two novel elements to learn the video object segmentation model.The first one is the scribble attention module,which captures more accurate context information and learns an effective attention map to enhance the contrast between foreground and background.The other one is the scribble-supervised loss,which can optimize the unlabeled pixels and dynamically correct inaccurate segmented areas during the training stage.To evaluate the proposed method,we implement experiments on two video object segmentation benchmark datasets,You Tube-video object segmentation(VOS),and densely annotated video segmentation(DAVIS)-2017.We first generate the scribble annotations from the original per-pixel annotations.Then,we train our model and compare its test performance with the baseline models and other existing works.Extensive experiments demonstrate that the proposed method can work effectively and approach to the methods requiring the dense per-pixel annotations.
基金Project supported by the National Natural Science Foundation of China (Grant No.60572127), the Development Foundation of Shanghai Municipal Commission of Education (Grant No.05AZ43), and the Shanghai Leading Academic Discipline Project (Grant No.T0102)
文摘In this paper an efficient compressed domain moving object segmentation algorithm is proposed, in which the motion vector (MV) field parsed from the compressed video is the only cue used for moving object segmentation. First the MV field is temporally and spatially normalized, and then accumulated by an iterative backward projection to enhance salient motions and alleviate noisy MVs. The accumulated MV field is then segmented into motion-homogenous regions using a modified statistical region growing approach. Finally, moving object regions are extracted in turn based on minimization of the joint prediction error using the estimated motion models of two region sets containing the candidate object region and other remaining regions, respectively. Experimental results on several H.264 compressed video sequences demonstrate good segmentation performance.
基金supported by the National Natural Science Foundation of China(61233010 61305106)+2 种基金the Shanghai Natural Science Foundation(17ZR1409700 18ZR1415300)the basic research project of Shanghai Municipal Science and Technology Commission(16JC1400900)
文摘This paper concerns the problem of object segmentation in real-time for picking system. A region proposal method inspired by human glance based on the convolutional neural network is proposed to select promising regions, allowing more processing is reserved only for these regions. The speed of object segmentation is significantly improved by the region proposal method.By the combination of the region proposal method based on the convolutional neural network and superpixel method, the category and location information can be used to segment objects and image redundancy is significantly reduced. The processing time is reduced considerably by this to achieve the real time. Experiments show that the proposed method can segment the interested target object in real time on an ordinary laptop.
基金This work was supported by National Natural Science Foundation of China (No.60372066)
文摘Small storage space for photographs in formal documents is increasingly necessary in today's needs for huge amounts of data communication and storage. Traditional compression algorithms do not sufficiently utilize the distinctness of formal photographs. That is, the object is an image of the human head, and the background is in unicolor. Therefore, the compression is of low efficiency and the image after compression is still space-consuming. This paper presents an image compression algorithm based on object segmentation for practical high-efficiency applications. To achieve high coding efficiency, shape-adaptive discrete wavelet transforms are used to transformation arbitrarily shaped objects. The areas of the human head and its background are compressed separately to reduce the coding redundancy of the background. Two methods, lossless image contour coding based on differential chain, and modified set partitioning in hierarchical trees (SPIHT) algorithm of arbitrary shape, are discussed in detail. The results of experiments show that when bit per pixel (bpp)is equal to 0.078, peak signal-to-noise ratio (PSNR) of reconstructed photograph will exceed the standard of SPIHT by nearly 4dB.
文摘With the development of the modern information society, more and more multimedia information is available. So the technology of multimedia processing is becoming the important task for the irrelevant area of scientist. Among of the multimedia, the visual informarion is more attractive due to its direct, vivid characteristic, but at the same rime the huge amount of video data causes many challenges if the video storage, processing and transmission.
基金supported by the National Natural Science Foundation of China (No.61872189)。
文摘Current mainstream unsupervised video object segmentation(UVOS) approaches typically incorporate optical flow as motion information to locate the primary objects in coherent video frames. However, they fuse appearance and motion information without evaluating the quality of the optical flow. When poor-quality optical flow is used for the interaction with the appearance information, it introduces significant noise and leads to a decline in overall performance. To alleviate this issue, we first employ a quality evaluation module(QEM) to evaluate the optical flow. Then, we select high-quality optical flow as motion cues to fuse with the appearance information, which can prevent poor-quality optical flow from diverting the network's attention. Moreover, we design an appearance-guided fusion module(AGFM) to better integrate appearance and motion information. Extensive experiments on several widely utilized datasets, including DAVIS-16, FBMS-59, and You Tube-Objects, demonstrate that the proposed method outperforms existing methods.
基金National Natural Science Foundation of China(Nos.61702323 and 62172268)the Shanghai Municipal Natural Science Foundation,China(No.20ZR1423100)+2 种基金the Open Fund of Science and Technology on Thermal Energy and Power Laboratory(No.TPL2020C02)Wuhan 2nd Ship Design and Research Institute,Wuhan,China,the National Key Research and Development Program of China(No.2018YFB1306303)the Major Basic Research Projects of Natural Science Foundation of Shandong Province,China(No.ZR2019ZD07).
文摘Moving object segmentation(MOS),aiming at segmenting moving objects from video frames,is an important and challenging task in computer vision and with various applications.With the development of deep learning(DL),MOS has also entered the era of deep models toward spatiotemporal feature learning.This paper aims to provide the latest review of recent DL-based MOS methods proposed during the past three years.Specifically,we present a more up-to-date categorization based on model characteristics,then compare and discuss each category from feature learning(FL),and model training and evaluation perspectives.For FL,the methods reviewed are divided into three types:spatial FL,temporal FL,and spatiotemporal FL,then analyzed from input and model architectures aspects,three input types,and four typical preprocessing subnetworks are summarized.In terms of training,we discuss ideas for enhancing model transferability.In terms of evaluation,based on a previous categorization of scene dependent evaluation and scene independent evaluation,and combined with whether used videos are recorded with static or moving cameras,we further provide four subdivided evaluation setups and analyze that of reviewed methods.We also show performance comparisons of some reviewed MOS methods and analyze the advantages and disadvantages of reviewed MOS methods in terms of technology.Finally,based on the above comparisons and discussions,we present research prospects and future directions.
基金This work was supported by the National Natural Science Foundation of China(62176169,61703077,and 62102207).
文摘Previous video object segmentation approachesmainly focus on simplex solutions linking appearance and motion,limiting effective feature collaboration between these two cues.In this work,we study a novel and efficient full-duplex strategy network(FSNet)to address this issue,by considering a better mutual restraint scheme linking motion and appearance allowing exploitation of cross-modal features from the fusion and decoding stage.Specifically,we introduce a relational cross-attention module(RCAM)to achieve bidirectional message propagation across embedding sub-spaces.To improve the model’s robustness and update inconsistent features from the spatiotemporal embeddings,we adopt a bidirectional purification module after the RCAM.Extensive experiments on five popular benchmarks show that our FSNet is robust to various challenging scenarios(e.g.,motion blur and occlusion),and compares well to leading methods both for video object segmentation and video salient object detection.The project is publicly available at https://github.com/GewelsJI/FSNet.
基金partially supported by the National Natural Science Foundation of China(Grant Nos.61802197,62072449,and 61632003)the Science and Technology Development Fund,Macao SAR(Grant Nos.0018/2019/AKP and SKL-IOTSC(UM)-2021-2023)+1 种基金the Guangdong Science and Technology Department(Grant No.2020B1515130001)University of Macao(Grant Nos.MYRG2020-00253-FST and MYRG2022-00059-FST).
文摘We present a lightweight and efficient semisupervised video object segmentation network based on the space-time memory framework.To some extent,our method solves the two difficulties encountered in traditional video object segmentation:one is that the single frame calculation time is too long,and the other is that the current frame’s segmentation should use more information from past frames.The algorithm uses a global context(GC)module to achieve highperformance,real-time segmentation.The GC module can effectively integrate multi-frame image information without increased memory and can process each frame in real time.Moreover,the prediction mask of the previous frame is helpful for the segmentation of the current frame,so we input it into a spatial constraint module(SCM),which constrains the areas of segments in the current frame.The SCM effectively alleviates mismatching of similar targets yet consumes few additional resources.We added a refinement module to the decoder to improve boundary segmentation.Our model achieves state-of-the-art results on various datasets,scoring 80.1%on YouTube-VOS 2018 and a J&F score of 78.0%on DAVIS 2017,while taking 0.05 s per frame on the DAVIS 2016 validation dataset.
基金the PID2022‐137451OB‐I00 and PID2022‐137629OA‐I00 projects funded by the MICIU/AEIAEI/10.13039/501100011033 and by ERDF/EU.
文摘Cancer is one of the leading causes of death in the world,with radiotherapy as one of the treatment options.Radiotherapy planning starts with delineating the affected area from healthy organs,called organs at risk(OAR).A new approach to automatic OAR seg-mentation in the chest cavity in Computed Tomography(CT)images is presented.The proposed approach is based on the modified U‐Net architecture with the ResNet‐34 encoder,which is the baseline adopted in this work.The new two‐branch CS‐SA U‐Net architecture is proposed,which consists of two parallel U‐Net models in which self‐attention blocks with cosine similarity as query‐key similarity function(CS‐SA)blocks are inserted between the encoder and decoder,which enabled the use of con-sistency regularisation.The proposed solution demonstrates state‐of‐the‐art performance for the problem of OAR segmentation in CT images on the publicly available SegTHOR benchmark dataset in terms of a Dice coefficient(oesophagus-0.8714,heart-0.9516,trachea-0.9286,aorta-0.9510)and Hausdorff distance(oesophagus-0.2541,heart-0.1514,trachea-0.1722,aorta-0.1114)and significantly outperforms the baseline.The current approach is demonstrated to be viable for improving the quality of OAR segmentation for radiotherapy planning.
基金Supported by National Basic Research Program of China (Grant No.2006CB303105)the Chinese Ministry of Education Innovation Team Fund Project (Grant No.IRT0707)+3 种基金the National Natural Science Foundation of China (Grant Nos.60673109 and 60801053)Beijing Excellent Doctoral Thesis Program (Grant No. YB20081000401)Beijing Municipal Natural Science Foundation (Grant No.4082025)Doctoral Foundation of China (Grant No.20070004037)
文摘Moving object segmentation is one of the most challenging issues in computer vision. In this paper, we propose a new algorithm for static camera foreground segmentation. It combines Gaussian mixture model (GMM) and active contours method, and produces much better results than conventional background subtraction methods. It formulates foreground segmentation as an energy minimization problem and minimizes the energy function using curve evolution method. Our algorithm integrates the GMM background model, shadow elimination term and curve evolution edge stopping term into energy function. It achieves more accurate segmentation than existing methods of the same type. Promising results on real images demonstrate the potential of the presented method.
文摘While the development of particular video segmentation algorithms has attracted considerable research interest, relatively little effort has been devoted to provide a methodology for evaluating their performance. In this paper, we propose a methodology to objectively evaluate video segmentation algorithm with ground-truth, which is based on computing the deviation of segmentation results from the reference segmentation. Four different metrics based on classification pixels, edges, relative foreground area and relative position respectively are combined to address the spatial accuracy. Temporal coherency is evaluated by utilizing the difference of spatial accuracy between successive frames. The experimental results show the feasibility of our approach. Moreover, it is computationally more efficient than previous methods. It can be applied to provide an offline ranking among different segmentation algorithms and to optimally set the parameters for a given algorithm.
基金Supported by the National Natural Science Foundation of China (No. 60772134, 60902081, 60902052) the 111 Project (No.B08038) the Fundamental Research Funds for the Central Universities(No.72105457).
文摘A novel moving objects segmentation method is proposed in this paper. A modified three dimensional recursive search (3DRS) algorithm is used in order to obtain motion information accurately. A motion feature descriptor (MFD) is designed to describe motion feature of each block in a picture based on motion intensity, motion in occlusion areas, and motion correlation among neighbouring blocks. Then, a fuzzy C-means clustering algorithm (FCM) is implemented based on those MFDs so as to segment moving objects. Moreover, a new parameter named as gathering degree is used to distinguish foreground moving objects and background motion. Experimental results demonstrate the effectiveness of the proposed method.
文摘3D object recognition is a challenging task for intelligent and robot systems in industrial and home indoor environments.It is critical for such systems to recognize and segment the 3D object instances that they encounter on a frequent basis.The computer vision,graphics,and machine learning fields have all given it a lot of attention.Traditionally,3D segmentation was done with hand-crafted features and designed approaches that didn’t achieve acceptable performance and couldn’t be generalized to large-scale data.Deep learning approaches have lately become the preferred method for 3D segmentation challenges by their great success in 2D computer vision.However,the task of instance segmentation is currently less explored.In this paper,we propose a novel approach for efficient 3D instance segmentation using red green blue and depth(RGB-D)data based on deep learning.The 2D region based convolutional neural networks(Mask R-CNN)deep learning model with point based rending module is adapted to integrate with depth information to recognize and segment 3D instances of objects.In order to generate 3D point cloud coordinates(x,y,z),segmented 2D pixels(u,v)of recognized object regions in the RGB image are merged into(u,v)points of the depth image.Moreover,we conducted an experiment and analysis to compare our proposed method from various points of view and distances.The experimentation shows the proposed 3D object recognition and instance segmentation are sufficiently beneficial to support object handling in robotic and intelligent systems.
文摘A fast interactive segmentation algorithm of image-sequences based on relative fuzzy connectedness is presented. In comparison with the original algorithm, the proposed one, with the same accuracy, accelerates the segmentation speed by three times for single image. Meanwhile, this fast segmentation algorithm is extended from single object to multiple objects and from single-image to image-sequences. Thus the segmentation of multiple objects from complex hackground and batch segmentation of image-sequences can be achieved. In addition, a post-processing scheme is incorporated in this algorithm, which extracts smooth edge with one-pixel-width for each segmented object. The experimental results illustrate that the proposed algorithm can obtain the object regions of interest from medical image or image-sequences as well as man-made images quickly and reliably with only a little interaction.
文摘Undeniably,Deep Learning(DL)has rapidly eroded traditional machine learning in Remote Sensing(RS)and geoscience domains with applications such as scene understanding,material identification,extreme weather detection,oil spill identification,among many others.Traditional machine learning algorithms are given less and less attention in the era of big data.Recently,a substantial amount of work aimed at developing image classification approaches based on the DL model’s success in computer vision.The number of relevant articles has nearly doubled every year since 2015.Advances in remote sensing technology,as well as the rapidly expanding volume of publicly available satellite imagery on a worldwide scale,have opened up the possibilities for a wide range of modern applications.However,there are some challenges related to the availability of annotated data,the complex nature of data,and model parameterization,which strongly impact performance.In this article,a comprehensive review of the literature encompassing a broad spectrum of pioneer work in remote sensing image classification is presented including network architectures(vintage Convolutional Neural Network,CNN;Fully Convolutional Networks,FCN;encoder-decoder,recurrent networks;attention models,and generative adversarial models).The characteristics,capabilities,and limitations of current DL models were examined,and potential research directions were discussed.
文摘Computer vision systems have an impressive spread both for their practicalapplication and for theoretical research . The common approach used in such systems consists of agood segmentation of moving objects from video sequences . This paper presents an automaticalgorithm for segmenting and extracting moving objects suitable for indoor and outdoor videoapplications, where the background scene can be captured beforehand . Since edge detection is oftenused to extract accurate boundaries of the image's objects, the first step in our algorithm isaccomplished by combining two edge maps which are detected from the frame difference in twoconsecutive frames and the background subtraction . After removing edge points that belong to thebackground, the resulting moving edge map is fed to the object extraction step . A fundamental taskin this step is to declare the candidates of the moving object, followed by applying morphologicaloperations. The algorithm is implemented on a real video sequence as well as MPEG- 4 sequence andgood segmentation results are achieved.
文摘Interactive image segmentation(IIS)is an important technique for obtaining pixel-level annotations.In many cases,target objects share similar semantics.However,IIS methods neglect this connection and in particular the cues provided by representations of previously segmented objects,previous user interaction,and previous prediction masks,which can all provide suitable priors for the current annotation.In this paper,we formulate a sequential interactive image segmentation(SIIS)task for minimizing user interaction when segmenting sequences of related images,and we provide a practical approach to this task using two pertinent designs.The first is a novel interaction mode.When annotating a new sample,our method can automatically propose an initial click proposal based on previous annotation.This dramatically helps to reduce the interaction burden on the user.The second is an online optimization strategy,with the goal of providing semantic information when annotating specific targets,optimizing the model with dense supervision from previously labeled samples.Experiments demonstrate the effectiveness of regarding SIIS as a particular task,and our methods for addressing it.
基金supported in part by the National Natural Science Foundation of China under Grant No.61075026the National Basic Research 973 Program of China under Grant No.2011CB302203.
文摘In this paper we present a simultaneous segmentation algorithm for multiple highly-occluded objects, which combines high-level knowledge and low-level information in a unified framework. The high-level knowledge provides sophis- ticated shape priors with the consideration of blocking relationship between nearby objects. Different from conventional layered model which attempts to solve the full ordering problem, we decompose the problem into a series of pairwise ones and this makes our algorithm scalable to a large number of objects. Objects are segmented in pixel level with higher-order soft constraints from superpixels, by a dual-level conditional random field. The model is optimized alternately by object layout and pixel-wise segmentation. V^e evaluate our system on different objects, i.e., clothing and pedestrian, and show impressive segmentation results and significant improvement over state-of-the-art segmentation algorithms.
基金the National Natural Science Foundation of China(No.61702323)。
文摘Moving object segmentation (MOS) is one of the essential functions of the vision system of all robots,including medical robots. Deep learning-based MOS methods, especially deep end-to-end MOS methods, are actively investigated in this field. Foreground segmentation networks (FgSegNets) are representative deep end-to-endMOS methods proposed recently. This study explores a new mechanism to improve the spatial feature learningcapability of FgSegNets with relatively few brought parameters. Specifically, we propose an enhanced attention(EA) module, a parallel connection of an attention module and a lightweight enhancement module, with sequentialattention and residual attention as special cases. We also propose integrating EA with FgSegNet_v2 by taking thelightweight convolutional block attention module as the attention module and plugging EA module after the twoMaxpooling layers of the encoder. The derived new model is named FgSegNet_v2 EA. The ablation study verifiesthe effectiveness of the proposed EA module and integration strategy. The results on the CDnet2014 dataset,which depicts human activities and vehicles captured in different scenes, show that FgSegNet_v2 EA outperformsFgSegNet_v2 by 0.08% and 14.5% under the settings of scene dependent evaluation and scene independent evaluation, respectively, which indicates the positive effect of EA on improving spatial feature learning capability ofFgSegNet_v2.