In this paper an efficient compressed domain moving object segmentation algorithm is proposed, in which the motion vector (MV) field parsed from the compressed video is the only cue used for moving object segmentati...In this paper an efficient compressed domain moving object segmentation algorithm is proposed, in which the motion vector (MV) field parsed from the compressed video is the only cue used for moving object segmentation. First the MV field is temporally and spatially normalized, and then accumulated by an iterative backward projection to enhance salient motions and alleviate noisy MVs. The accumulated MV field is then segmented into motion-homogenous regions using a modified statistical region growing approach. Finally, moving object regions are extracted in turn based on minimization of the joint prediction error using the estimated motion models of two region sets containing the candidate object region and other remaining regions, respectively. Experimental results on several H.264 compressed video sequences demonstrate good segmentation performance.展开更多
Recently,video object segmentation has received great attention in the computer vision community.Most of the existing methods heavily rely on the pixel-wise human annotations,which are expensive and time-consuming to ...Recently,video object segmentation has received great attention in the computer vision community.Most of the existing methods heavily rely on the pixel-wise human annotations,which are expensive and time-consuming to obtain.To tackle this problem,we make an early attempt to achieve video object segmentation with scribble-level supervision,which can alleviate large amounts of human labor for collecting the manual annotation.However,using conventional network architectures and learning objective functions under this scenario cannot work well as the supervision information is highly sparse and incomplete.To address this issue,this paper introduces two novel elements to learn the video object segmentation model.The first one is the scribble attention module,which captures more accurate context information and learns an effective attention map to enhance the contrast between foreground and background.The other one is the scribble-supervised loss,which can optimize the unlabeled pixels and dynamically correct inaccurate segmented areas during the training stage.To evaluate the proposed method,we implement experiments on two video object segmentation benchmark datasets,You Tube-video object segmentation(VOS),and densely annotated video segmentation(DAVIS)-2017.We first generate the scribble annotations from the original per-pixel annotations.Then,we train our model and compare its test performance with the baseline models and other existing works.Extensive experiments demonstrate that the proposed method can work effectively and approach to the methods requiring the dense per-pixel annotations.展开更多
This paper concerns the problem of object segmentation in real-time for picking system. A region proposal method inspired by human glance based on the convolutional neural network is proposed to select promising regio...This paper concerns the problem of object segmentation in real-time for picking system. A region proposal method inspired by human glance based on the convolutional neural network is proposed to select promising regions, allowing more processing is reserved only for these regions. The speed of object segmentation is significantly improved by the region proposal method.By the combination of the region proposal method based on the convolutional neural network and superpixel method, the category and location information can be used to segment objects and image redundancy is significantly reduced. The processing time is reduced considerably by this to achieve the real time. Experiments show that the proposed method can segment the interested target object in real time on an ordinary laptop.展开更多
Small storage space for photographs in formal documents is increasingly necessary in today's needs for huge amounts of data communication and storage. Traditional compression algorithms do not sufficiently utilize th...Small storage space for photographs in formal documents is increasingly necessary in today's needs for huge amounts of data communication and storage. Traditional compression algorithms do not sufficiently utilize the distinctness of formal photographs. That is, the object is an image of the human head, and the background is in unicolor. Therefore, the compression is of low efficiency and the image after compression is still space-consuming. This paper presents an image compression algorithm based on object segmentation for practical high-efficiency applications. To achieve high coding efficiency, shape-adaptive discrete wavelet transforms are used to transformation arbitrarily shaped objects. The areas of the human head and its background are compressed separately to reduce the coding redundancy of the background. Two methods, lossless image contour coding based on differential chain, and modified set partitioning in hierarchical trees (SPIHT) algorithm of arbitrary shape, are discussed in detail. The results of experiments show that when bit per pixel (bpp)is equal to 0.078, peak signal-to-noise ratio (PSNR) of reconstructed photograph will exceed the standard of SPIHT by nearly 4dB.展开更多
With the development of the modern information society, more and more multimedia information is available. So the technology of multimedia processing is becoming the important task for the irrelevant area of scientist...With the development of the modern information society, more and more multimedia information is available. So the technology of multimedia processing is becoming the important task for the irrelevant area of scientist. Among of the multimedia, the visual informarion is more attractive due to its direct, vivid characteristic, but at the same rime the huge amount of video data causes many challenges if the video storage, processing and transmission.展开更多
Current mainstream unsupervised video object segmentation(UVOS) approaches typically incorporate optical flow as motion information to locate the primary objects in coherent video frames. However, they fuse appearance...Current mainstream unsupervised video object segmentation(UVOS) approaches typically incorporate optical flow as motion information to locate the primary objects in coherent video frames. However, they fuse appearance and motion information without evaluating the quality of the optical flow. When poor-quality optical flow is used for the interaction with the appearance information, it introduces significant noise and leads to a decline in overall performance. To alleviate this issue, we first employ a quality evaluation module(QEM) to evaluate the optical flow. Then, we select high-quality optical flow as motion cues to fuse with the appearance information, which can prevent poor-quality optical flow from diverting the network's attention. Moreover, we design an appearance-guided fusion module(AGFM) to better integrate appearance and motion information. Extensive experiments on several widely utilized datasets, including DAVIS-16, FBMS-59, and You Tube-Objects, demonstrate that the proposed method outperforms existing methods.展开更多
Moving object segmentation is one of the most challenging issues in computer vision. In this paper, we propose a new algorithm for static camera foreground segmentation. It combines Gaussian mixture model (GMM) and ...Moving object segmentation is one of the most challenging issues in computer vision. In this paper, we propose a new algorithm for static camera foreground segmentation. It combines Gaussian mixture model (GMM) and active contours method, and produces much better results than conventional background subtraction methods. It formulates foreground segmentation as an energy minimization problem and minimizes the energy function using curve evolution method. Our algorithm integrates the GMM background model, shadow elimination term and curve evolution edge stopping term into energy function. It achieves more accurate segmentation than existing methods of the same type. Promising results on real images demonstrate the potential of the presented method.展开更多
While the development of particular video segmentation algorithms has attracted considerable research interest, relatively little effort has been devoted to provide a methodology for evaluating their performance. In t...While the development of particular video segmentation algorithms has attracted considerable research interest, relatively little effort has been devoted to provide a methodology for evaluating their performance. In this paper, we propose a methodology to objectively evaluate video segmentation algorithm with ground-truth, which is based on computing the deviation of segmentation results from the reference segmentation. Four different metrics based on classification pixels, edges, relative foreground area and relative position respectively are combined to address the spatial accuracy. Temporal coherency is evaluated by utilizing the difference of spatial accuracy between successive frames. The experimental results show the feasibility of our approach. Moreover, it is computationally more efficient than previous methods. It can be applied to provide an offline ranking among different segmentation algorithms and to optimally set the parameters for a given algorithm.展开更多
Moving object segmentation(MOS),aiming at segmenting moving objects from video frames,is an important and challenging task in computer vision and with various applications.With the development of deep learning(DL),MOS...Moving object segmentation(MOS),aiming at segmenting moving objects from video frames,is an important and challenging task in computer vision and with various applications.With the development of deep learning(DL),MOS has also entered the era of deep models toward spatiotemporal feature learning.This paper aims to provide the latest review of recent DL-based MOS methods proposed during the past three years.Specifically,we present a more up-to-date categorization based on model characteristics,then compare and discuss each category from feature learning(FL),and model training and evaluation perspectives.For FL,the methods reviewed are divided into three types:spatial FL,temporal FL,and spatiotemporal FL,then analyzed from input and model architectures aspects,three input types,and four typical preprocessing subnetworks are summarized.In terms of training,we discuss ideas for enhancing model transferability.In terms of evaluation,based on a previous categorization of scene dependent evaluation and scene independent evaluation,and combined with whether used videos are recorded with static or moving cameras,we further provide four subdivided evaluation setups and analyze that of reviewed methods.We also show performance comparisons of some reviewed MOS methods and analyze the advantages and disadvantages of reviewed MOS methods in terms of technology.Finally,based on the above comparisons and discussions,we present research prospects and future directions.展开更多
Previous video object segmentation approachesmainly focus on simplex solutions linking appearance and motion,limiting effective feature collaboration between these two cues.In this work,we study a novel and efficient ...Previous video object segmentation approachesmainly focus on simplex solutions linking appearance and motion,limiting effective feature collaboration between these two cues.In this work,we study a novel and efficient full-duplex strategy network(FSNet)to address this issue,by considering a better mutual restraint scheme linking motion and appearance allowing exploitation of cross-modal features from the fusion and decoding stage.Specifically,we introduce a relational cross-attention module(RCAM)to achieve bidirectional message propagation across embedding sub-spaces.To improve the model’s robustness and update inconsistent features from the spatiotemporal embeddings,we adopt a bidirectional purification module after the RCAM.Extensive experiments on five popular benchmarks show that our FSNet is robust to various challenging scenarios(e.g.,motion blur and occlusion),and compares well to leading methods both for video object segmentation and video salient object detection.The project is publicly available at https://github.com/GewelsJI/FSNet.展开更多
We present a lightweight and efficient semisupervised video object segmentation network based on the space-time memory framework.To some extent,our method solves the two difficulties encountered in traditional video o...We present a lightweight and efficient semisupervised video object segmentation network based on the space-time memory framework.To some extent,our method solves the two difficulties encountered in traditional video object segmentation:one is that the single frame calculation time is too long,and the other is that the current frame’s segmentation should use more information from past frames.The algorithm uses a global context(GC)module to achieve highperformance,real-time segmentation.The GC module can effectively integrate multi-frame image information without increased memory and can process each frame in real time.Moreover,the prediction mask of the previous frame is helpful for the segmentation of the current frame,so we input it into a spatial constraint module(SCM),which constrains the areas of segments in the current frame.The SCM effectively alleviates mismatching of similar targets yet consumes few additional resources.We added a refinement module to the decoder to improve boundary segmentation.Our model achieves state-of-the-art results on various datasets,scoring 80.1%on YouTube-VOS 2018 and a J&F score of 78.0%on DAVIS 2017,while taking 0.05 s per frame on the DAVIS 2016 validation dataset.展开更多
Cancer is one of the leading causes of death in the world,with radiotherapy as one of the treatment options.Radiotherapy planning starts with delineating the affected area from healthy organs,called organs at risk(OAR...Cancer is one of the leading causes of death in the world,with radiotherapy as one of the treatment options.Radiotherapy planning starts with delineating the affected area from healthy organs,called organs at risk(OAR).A new approach to automatic OAR seg-mentation in the chest cavity in Computed Tomography(CT)images is presented.The proposed approach is based on the modified U‐Net architecture with the ResNet‐34 encoder,which is the baseline adopted in this work.The new two‐branch CS‐SA U‐Net architecture is proposed,which consists of two parallel U‐Net models in which self‐attention blocks with cosine similarity as query‐key similarity function(CS‐SA)blocks are inserted between the encoder and decoder,which enabled the use of con-sistency regularisation.The proposed solution demonstrates state‐of‐the‐art performance for the problem of OAR segmentation in CT images on the publicly available SegTHOR benchmark dataset in terms of a Dice coefficient(oesophagus-0.8714,heart-0.9516,trachea-0.9286,aorta-0.9510)and Hausdorff distance(oesophagus-0.2541,heart-0.1514,trachea-0.1722,aorta-0.1114)and significantly outperforms the baseline.The current approach is demonstrated to be viable for improving the quality of OAR segmentation for radiotherapy planning.展开更多
A novel moving objects segmentation method is proposed in this paper. A modified three dimensional recursive search (3DRS) algorithm is used in order to obtain motion information accurately. A motion feature descrip...A novel moving objects segmentation method is proposed in this paper. A modified three dimensional recursive search (3DRS) algorithm is used in order to obtain motion information accurately. A motion feature descriptor (MFD) is designed to describe motion feature of each block in a picture based on motion intensity, motion in occlusion areas, and motion correlation among neighbouring blocks. Then, a fuzzy C-means clustering algorithm (FCM) is implemented based on those MFDs so as to segment moving objects. Moreover, a new parameter named as gathering degree is used to distinguish foreground moving objects and background motion. Experimental results demonstrate the effectiveness of the proposed method.展开更多
3D object recognition is a challenging task for intelligent and robot systems in industrial and home indoor environments.It is critical for such systems to recognize and segment the 3D object instances that they encou...3D object recognition is a challenging task for intelligent and robot systems in industrial and home indoor environments.It is critical for such systems to recognize and segment the 3D object instances that they encounter on a frequent basis.The computer vision,graphics,and machine learning fields have all given it a lot of attention.Traditionally,3D segmentation was done with hand-crafted features and designed approaches that didn’t achieve acceptable performance and couldn’t be generalized to large-scale data.Deep learning approaches have lately become the preferred method for 3D segmentation challenges by their great success in 2D computer vision.However,the task of instance segmentation is currently less explored.In this paper,we propose a novel approach for efficient 3D instance segmentation using red green blue and depth(RGB-D)data based on deep learning.The 2D region based convolutional neural networks(Mask R-CNN)deep learning model with point based rending module is adapted to integrate with depth information to recognize and segment 3D instances of objects.In order to generate 3D point cloud coordinates(x,y,z),segmented 2D pixels(u,v)of recognized object regions in the RGB image are merged into(u,v)points of the depth image.Moreover,we conducted an experiment and analysis to compare our proposed method from various points of view and distances.The experimentation shows the proposed 3D object recognition and instance segmentation are sufficiently beneficial to support object handling in robotic and intelligent systems.展开更多
A fast interactive segmentation algorithm of image-sequences based on relative fuzzy connectedness is presented. In comparison with the original algorithm, the proposed one, with the same accuracy, accelerates the seg...A fast interactive segmentation algorithm of image-sequences based on relative fuzzy connectedness is presented. In comparison with the original algorithm, the proposed one, with the same accuracy, accelerates the segmentation speed by three times for single image. Meanwhile, this fast segmentation algorithm is extended from single object to multiple objects and from single-image to image-sequences. Thus the segmentation of multiple objects from complex hackground and batch segmentation of image-sequences can be achieved. In addition, a post-processing scheme is incorporated in this algorithm, which extracts smooth edge with one-pixel-width for each segmented object. The experimental results illustrate that the proposed algorithm can obtain the object regions of interest from medical image or image-sequences as well as man-made images quickly and reliably with only a little interaction.展开更多
Urban land provides a suitable location for various economic activities which affect the development of surrounding areas. With rapid industrialization and urbanization, the contradictions in land-use become more noti...Urban land provides a suitable location for various economic activities which affect the development of surrounding areas. With rapid industrialization and urbanization, the contradictions in land-use become more noticeable. Urban administrators and decision-makers seek modern methods and technology to provide information support for urban growth. Recently, with the fast development of high-resolution sensor technology, more relevant data can be obtained, which is an advantage in studying the sustainable development of urban land-use. However, these data are only information sources and are a mixture of "information" and "noise". Processing, analysis and information extraction from remote sensing data is necessary to provide useful information. This paper extracts urban land-use information from a high-resolution image by using the multi-feature information of the image objects, and adopts an object-oriented image analysis approach and multi-scale image segmentation technology. A classification and extraction model is set up based on the multi-features of the image objects, in order to contribute to information for reasonable planning and effective management. This new image analysis approach offers a satisfactory solution for extracting information quickly and efficiently.展开更多
This paper proposes a motion-based region growing segmentation scheme for the object-based video coding, which segments an image into homogeneous regions characterized by a coherent motion. It adopts a block matching ...This paper proposes a motion-based region growing segmentation scheme for the object-based video coding, which segments an image into homogeneous regions characterized by a coherent motion. It adopts a block matching algorithm to estimate motion vectors and uses morphological tools such as open-close by reconstruction and the region-growing version of the watershed algorithm for spatial segmentation to improve the temporal segmentation. In order to determine the reliable motion vectors, this paper also proposes a change detection algorithm and a multi-candidate pro- screening motion estimation method. Preliminary simulation results demonstrate that the proposed scheme is feasible. The main advantage of the scheme is its low computational load.展开更多
Object-based classification differentiates forest gaps from canopies at large regional scale by using remote sensing data. To study the segmentation and classification processes of object-based forest gaps classificat...Object-based classification differentiates forest gaps from canopies at large regional scale by using remote sensing data. To study the segmentation and classification processes of object-based forest gaps classification at a regional scale, we sampled a natural secondary forest in northeast China at Maoershan Experimental Forest Farm.Airborne light detection and ranging(LiDAR; 3.7 points/m2) data were collected as the original data source and the canopy height model(CHM) and topographic dataset were extracted from the LiDAR data. The accuracy of objectbased forest gaps classification depends on previous segmentation. Thus our first step was to define 10 different scale parameters in CHM image segmentation. After image segmentation, the machine learning classification method was used to classify three kinds of object classes, namely,forest gaps, tree canopies, and others. The common support vector machine(SVM) classifier with the radial basis function kernel(RBF) was first adopted to test the effect of classification features(vegetation height features and some typical topographic features) on forest gap classification.Then the different classifiers(KNN, Bayes, decision tree,and SVM with linear kernel) were further adopted to compare the effect of classifiers on machine learning forest gaps classification. Segmentation accuracy and classification accuracy were evaluated by using Mo¨ller's method and confusion metrics, respectively. The scale parameter had a significant effect on object-based forest gap segmentation and classification. Classification accuracies at different scales revealed that there were two optimal scales(10 and 20) that provided similar accuracy, with the scale of 10 yielding slightly greater accuracy than 20. The accuracy of the classification by using combination of height features and SVM classifier with linear kernel was91% at the optimal scale parameter of 10, and it was highest comparing with other classification classifiers, such as SVM RBF(90%), Decision Tree(90%), Bayes(90%),or KNN(87%). The classifiers had no significant effect on forest gap classification, but the fewer parameters in the classifier equation and higher speed of operation probably lead to a higher accuracy of final classifications. Our results confirm that object-based classification can extract forest gaps at a large regional scale with appropriate classification features and classifiers using LiDAR data. We note, however, that final satisfaction of forest gap classification depends on the determination of optimal scale(s) of segmentation.展开更多
基金Project supported by the National Natural Science Foundation of China (Grant No.60572127), the Development Foundation of Shanghai Municipal Commission of Education (Grant No.05AZ43), and the Shanghai Leading Academic Discipline Project (Grant No.T0102)
文摘In this paper an efficient compressed domain moving object segmentation algorithm is proposed, in which the motion vector (MV) field parsed from the compressed video is the only cue used for moving object segmentation. First the MV field is temporally and spatially normalized, and then accumulated by an iterative backward projection to enhance salient motions and alleviate noisy MVs. The accumulated MV field is then segmented into motion-homogenous regions using a modified statistical region growing approach. Finally, moving object regions are extracted in turn based on minimization of the joint prediction error using the estimated motion models of two region sets containing the candidate object region and other remaining regions, respectively. Experimental results on several H.264 compressed video sequences demonstrate good segmentation performance.
基金supported in part by the National Key R&D Program of China(2017YFB0502904)the National Science Foundation of China(61876140)。
文摘Recently,video object segmentation has received great attention in the computer vision community.Most of the existing methods heavily rely on the pixel-wise human annotations,which are expensive and time-consuming to obtain.To tackle this problem,we make an early attempt to achieve video object segmentation with scribble-level supervision,which can alleviate large amounts of human labor for collecting the manual annotation.However,using conventional network architectures and learning objective functions under this scenario cannot work well as the supervision information is highly sparse and incomplete.To address this issue,this paper introduces two novel elements to learn the video object segmentation model.The first one is the scribble attention module,which captures more accurate context information and learns an effective attention map to enhance the contrast between foreground and background.The other one is the scribble-supervised loss,which can optimize the unlabeled pixels and dynamically correct inaccurate segmented areas during the training stage.To evaluate the proposed method,we implement experiments on two video object segmentation benchmark datasets,You Tube-video object segmentation(VOS),and densely annotated video segmentation(DAVIS)-2017.We first generate the scribble annotations from the original per-pixel annotations.Then,we train our model and compare its test performance with the baseline models and other existing works.Extensive experiments demonstrate that the proposed method can work effectively and approach to the methods requiring the dense per-pixel annotations.
基金supported by the National Natural Science Foundation of China(61233010 61305106)+2 种基金the Shanghai Natural Science Foundation(17ZR1409700 18ZR1415300)the basic research project of Shanghai Municipal Science and Technology Commission(16JC1400900)
文摘This paper concerns the problem of object segmentation in real-time for picking system. A region proposal method inspired by human glance based on the convolutional neural network is proposed to select promising regions, allowing more processing is reserved only for these regions. The speed of object segmentation is significantly improved by the region proposal method.By the combination of the region proposal method based on the convolutional neural network and superpixel method, the category and location information can be used to segment objects and image redundancy is significantly reduced. The processing time is reduced considerably by this to achieve the real time. Experiments show that the proposed method can segment the interested target object in real time on an ordinary laptop.
基金This work was supported by National Natural Science Foundation of China (No.60372066)
文摘Small storage space for photographs in formal documents is increasingly necessary in today's needs for huge amounts of data communication and storage. Traditional compression algorithms do not sufficiently utilize the distinctness of formal photographs. That is, the object is an image of the human head, and the background is in unicolor. Therefore, the compression is of low efficiency and the image after compression is still space-consuming. This paper presents an image compression algorithm based on object segmentation for practical high-efficiency applications. To achieve high coding efficiency, shape-adaptive discrete wavelet transforms are used to transformation arbitrarily shaped objects. The areas of the human head and its background are compressed separately to reduce the coding redundancy of the background. Two methods, lossless image contour coding based on differential chain, and modified set partitioning in hierarchical trees (SPIHT) algorithm of arbitrary shape, are discussed in detail. The results of experiments show that when bit per pixel (bpp)is equal to 0.078, peak signal-to-noise ratio (PSNR) of reconstructed photograph will exceed the standard of SPIHT by nearly 4dB.
文摘With the development of the modern information society, more and more multimedia information is available. So the technology of multimedia processing is becoming the important task for the irrelevant area of scientist. Among of the multimedia, the visual informarion is more attractive due to its direct, vivid characteristic, but at the same rime the huge amount of video data causes many challenges if the video storage, processing and transmission.
基金supported by the National Natural Science Foundation of China (No.61872189)。
文摘Current mainstream unsupervised video object segmentation(UVOS) approaches typically incorporate optical flow as motion information to locate the primary objects in coherent video frames. However, they fuse appearance and motion information without evaluating the quality of the optical flow. When poor-quality optical flow is used for the interaction with the appearance information, it introduces significant noise and leads to a decline in overall performance. To alleviate this issue, we first employ a quality evaluation module(QEM) to evaluate the optical flow. Then, we select high-quality optical flow as motion cues to fuse with the appearance information, which can prevent poor-quality optical flow from diverting the network's attention. Moreover, we design an appearance-guided fusion module(AGFM) to better integrate appearance and motion information. Extensive experiments on several widely utilized datasets, including DAVIS-16, FBMS-59, and You Tube-Objects, demonstrate that the proposed method outperforms existing methods.
基金Supported by National Basic Research Program of China (Grant No.2006CB303105)the Chinese Ministry of Education Innovation Team Fund Project (Grant No.IRT0707)+3 种基金the National Natural Science Foundation of China (Grant Nos.60673109 and 60801053)Beijing Excellent Doctoral Thesis Program (Grant No. YB20081000401)Beijing Municipal Natural Science Foundation (Grant No.4082025)Doctoral Foundation of China (Grant No.20070004037)
文摘Moving object segmentation is one of the most challenging issues in computer vision. In this paper, we propose a new algorithm for static camera foreground segmentation. It combines Gaussian mixture model (GMM) and active contours method, and produces much better results than conventional background subtraction methods. It formulates foreground segmentation as an energy minimization problem and minimizes the energy function using curve evolution method. Our algorithm integrates the GMM background model, shadow elimination term and curve evolution edge stopping term into energy function. It achieves more accurate segmentation than existing methods of the same type. Promising results on real images demonstrate the potential of the presented method.
文摘While the development of particular video segmentation algorithms has attracted considerable research interest, relatively little effort has been devoted to provide a methodology for evaluating their performance. In this paper, we propose a methodology to objectively evaluate video segmentation algorithm with ground-truth, which is based on computing the deviation of segmentation results from the reference segmentation. Four different metrics based on classification pixels, edges, relative foreground area and relative position respectively are combined to address the spatial accuracy. Temporal coherency is evaluated by utilizing the difference of spatial accuracy between successive frames. The experimental results show the feasibility of our approach. Moreover, it is computationally more efficient than previous methods. It can be applied to provide an offline ranking among different segmentation algorithms and to optimally set the parameters for a given algorithm.
基金National Natural Science Foundation of China(Nos.61702323 and 62172268)the Shanghai Municipal Natural Science Foundation,China(No.20ZR1423100)+2 种基金the Open Fund of Science and Technology on Thermal Energy and Power Laboratory(No.TPL2020C02)Wuhan 2nd Ship Design and Research Institute,Wuhan,China,the National Key Research and Development Program of China(No.2018YFB1306303)the Major Basic Research Projects of Natural Science Foundation of Shandong Province,China(No.ZR2019ZD07).
文摘Moving object segmentation(MOS),aiming at segmenting moving objects from video frames,is an important and challenging task in computer vision and with various applications.With the development of deep learning(DL),MOS has also entered the era of deep models toward spatiotemporal feature learning.This paper aims to provide the latest review of recent DL-based MOS methods proposed during the past three years.Specifically,we present a more up-to-date categorization based on model characteristics,then compare and discuss each category from feature learning(FL),and model training and evaluation perspectives.For FL,the methods reviewed are divided into three types:spatial FL,temporal FL,and spatiotemporal FL,then analyzed from input and model architectures aspects,three input types,and four typical preprocessing subnetworks are summarized.In terms of training,we discuss ideas for enhancing model transferability.In terms of evaluation,based on a previous categorization of scene dependent evaluation and scene independent evaluation,and combined with whether used videos are recorded with static or moving cameras,we further provide four subdivided evaluation setups and analyze that of reviewed methods.We also show performance comparisons of some reviewed MOS methods and analyze the advantages and disadvantages of reviewed MOS methods in terms of technology.Finally,based on the above comparisons and discussions,we present research prospects and future directions.
基金This work was supported by the National Natural Science Foundation of China(62176169,61703077,and 62102207).
文摘Previous video object segmentation approachesmainly focus on simplex solutions linking appearance and motion,limiting effective feature collaboration between these two cues.In this work,we study a novel and efficient full-duplex strategy network(FSNet)to address this issue,by considering a better mutual restraint scheme linking motion and appearance allowing exploitation of cross-modal features from the fusion and decoding stage.Specifically,we introduce a relational cross-attention module(RCAM)to achieve bidirectional message propagation across embedding sub-spaces.To improve the model’s robustness and update inconsistent features from the spatiotemporal embeddings,we adopt a bidirectional purification module after the RCAM.Extensive experiments on five popular benchmarks show that our FSNet is robust to various challenging scenarios(e.g.,motion blur and occlusion),and compares well to leading methods both for video object segmentation and video salient object detection.The project is publicly available at https://github.com/GewelsJI/FSNet.
基金partially supported by the National Natural Science Foundation of China(Grant Nos.61802197,62072449,and 61632003)the Science and Technology Development Fund,Macao SAR(Grant Nos.0018/2019/AKP and SKL-IOTSC(UM)-2021-2023)+1 种基金the Guangdong Science and Technology Department(Grant No.2020B1515130001)University of Macao(Grant Nos.MYRG2020-00253-FST and MYRG2022-00059-FST).
文摘We present a lightweight and efficient semisupervised video object segmentation network based on the space-time memory framework.To some extent,our method solves the two difficulties encountered in traditional video object segmentation:one is that the single frame calculation time is too long,and the other is that the current frame’s segmentation should use more information from past frames.The algorithm uses a global context(GC)module to achieve highperformance,real-time segmentation.The GC module can effectively integrate multi-frame image information without increased memory and can process each frame in real time.Moreover,the prediction mask of the previous frame is helpful for the segmentation of the current frame,so we input it into a spatial constraint module(SCM),which constrains the areas of segments in the current frame.The SCM effectively alleviates mismatching of similar targets yet consumes few additional resources.We added a refinement module to the decoder to improve boundary segmentation.Our model achieves state-of-the-art results on various datasets,scoring 80.1%on YouTube-VOS 2018 and a J&F score of 78.0%on DAVIS 2017,while taking 0.05 s per frame on the DAVIS 2016 validation dataset.
基金the PID2022‐137451OB‐I00 and PID2022‐137629OA‐I00 projects funded by the MICIU/AEIAEI/10.13039/501100011033 and by ERDF/EU.
文摘Cancer is one of the leading causes of death in the world,with radiotherapy as one of the treatment options.Radiotherapy planning starts with delineating the affected area from healthy organs,called organs at risk(OAR).A new approach to automatic OAR seg-mentation in the chest cavity in Computed Tomography(CT)images is presented.The proposed approach is based on the modified U‐Net architecture with the ResNet‐34 encoder,which is the baseline adopted in this work.The new two‐branch CS‐SA U‐Net architecture is proposed,which consists of two parallel U‐Net models in which self‐attention blocks with cosine similarity as query‐key similarity function(CS‐SA)blocks are inserted between the encoder and decoder,which enabled the use of con-sistency regularisation.The proposed solution demonstrates state‐of‐the‐art performance for the problem of OAR segmentation in CT images on the publicly available SegTHOR benchmark dataset in terms of a Dice coefficient(oesophagus-0.8714,heart-0.9516,trachea-0.9286,aorta-0.9510)and Hausdorff distance(oesophagus-0.2541,heart-0.1514,trachea-0.1722,aorta-0.1114)and significantly outperforms the baseline.The current approach is demonstrated to be viable for improving the quality of OAR segmentation for radiotherapy planning.
基金Supported by the National Natural Science Foundation of China (No. 60772134, 60902081, 60902052) the 111 Project (No.B08038) the Fundamental Research Funds for the Central Universities(No.72105457).
文摘A novel moving objects segmentation method is proposed in this paper. A modified three dimensional recursive search (3DRS) algorithm is used in order to obtain motion information accurately. A motion feature descriptor (MFD) is designed to describe motion feature of each block in a picture based on motion intensity, motion in occlusion areas, and motion correlation among neighbouring blocks. Then, a fuzzy C-means clustering algorithm (FCM) is implemented based on those MFDs so as to segment moving objects. Moreover, a new parameter named as gathering degree is used to distinguish foreground moving objects and background motion. Experimental results demonstrate the effectiveness of the proposed method.
文摘3D object recognition is a challenging task for intelligent and robot systems in industrial and home indoor environments.It is critical for such systems to recognize and segment the 3D object instances that they encounter on a frequent basis.The computer vision,graphics,and machine learning fields have all given it a lot of attention.Traditionally,3D segmentation was done with hand-crafted features and designed approaches that didn’t achieve acceptable performance and couldn’t be generalized to large-scale data.Deep learning approaches have lately become the preferred method for 3D segmentation challenges by their great success in 2D computer vision.However,the task of instance segmentation is currently less explored.In this paper,we propose a novel approach for efficient 3D instance segmentation using red green blue and depth(RGB-D)data based on deep learning.The 2D region based convolutional neural networks(Mask R-CNN)deep learning model with point based rending module is adapted to integrate with depth information to recognize and segment 3D instances of objects.In order to generate 3D point cloud coordinates(x,y,z),segmented 2D pixels(u,v)of recognized object regions in the RGB image are merged into(u,v)points of the depth image.Moreover,we conducted an experiment and analysis to compare our proposed method from various points of view and distances.The experimentation shows the proposed 3D object recognition and instance segmentation are sufficiently beneficial to support object handling in robotic and intelligent systems.
文摘A fast interactive segmentation algorithm of image-sequences based on relative fuzzy connectedness is presented. In comparison with the original algorithm, the proposed one, with the same accuracy, accelerates the segmentation speed by three times for single image. Meanwhile, this fast segmentation algorithm is extended from single object to multiple objects and from single-image to image-sequences. Thus the segmentation of multiple objects from complex hackground and batch segmentation of image-sequences can be achieved. In addition, a post-processing scheme is incorporated in this algorithm, which extracts smooth edge with one-pixel-width for each segmented object. The experimental results illustrate that the proposed algorithm can obtain the object regions of interest from medical image or image-sequences as well as man-made images quickly and reliably with only a little interaction.
基金The paper is supported by the Research Foundation for OutstandingYoung Teachers , China University of Geosciences ( Wuhan) ( No .CUGQNL0616) Research Foundationfor State Key Laboratory of Geo-logical Processes and Mineral Resources ( No . MGMR2002-02)Hubei Provincial Depart ment of Education (B) .
文摘Urban land provides a suitable location for various economic activities which affect the development of surrounding areas. With rapid industrialization and urbanization, the contradictions in land-use become more noticeable. Urban administrators and decision-makers seek modern methods and technology to provide information support for urban growth. Recently, with the fast development of high-resolution sensor technology, more relevant data can be obtained, which is an advantage in studying the sustainable development of urban land-use. However, these data are only information sources and are a mixture of "information" and "noise". Processing, analysis and information extraction from remote sensing data is necessary to provide useful information. This paper extracts urban land-use information from a high-resolution image by using the multi-feature information of the image objects, and adopts an object-oriented image analysis approach and multi-scale image segmentation technology. A classification and extraction model is set up based on the multi-features of the image objects, in order to contribute to information for reasonable planning and effective management. This new image analysis approach offers a satisfactory solution for extracting information quickly and efficiently.
文摘This paper proposes a motion-based region growing segmentation scheme for the object-based video coding, which segments an image into homogeneous regions characterized by a coherent motion. It adopts a block matching algorithm to estimate motion vectors and uses morphological tools such as open-close by reconstruction and the region-growing version of the watershed algorithm for spatial segmentation to improve the temporal segmentation. In order to determine the reliable motion vectors, this paper also proposes a change detection algorithm and a multi-candidate pro- screening motion estimation method. Preliminary simulation results demonstrate that the proposed scheme is feasible. The main advantage of the scheme is its low computational load.
基金financially supported by grant from National Natural Science Foundation of China(No.31300533)
文摘Object-based classification differentiates forest gaps from canopies at large regional scale by using remote sensing data. To study the segmentation and classification processes of object-based forest gaps classification at a regional scale, we sampled a natural secondary forest in northeast China at Maoershan Experimental Forest Farm.Airborne light detection and ranging(LiDAR; 3.7 points/m2) data were collected as the original data source and the canopy height model(CHM) and topographic dataset were extracted from the LiDAR data. The accuracy of objectbased forest gaps classification depends on previous segmentation. Thus our first step was to define 10 different scale parameters in CHM image segmentation. After image segmentation, the machine learning classification method was used to classify three kinds of object classes, namely,forest gaps, tree canopies, and others. The common support vector machine(SVM) classifier with the radial basis function kernel(RBF) was first adopted to test the effect of classification features(vegetation height features and some typical topographic features) on forest gap classification.Then the different classifiers(KNN, Bayes, decision tree,and SVM with linear kernel) were further adopted to compare the effect of classifiers on machine learning forest gaps classification. Segmentation accuracy and classification accuracy were evaluated by using Mo¨ller's method and confusion metrics, respectively. The scale parameter had a significant effect on object-based forest gap segmentation and classification. Classification accuracies at different scales revealed that there were two optimal scales(10 and 20) that provided similar accuracy, with the scale of 10 yielding slightly greater accuracy than 20. The accuracy of the classification by using combination of height features and SVM classifier with linear kernel was91% at the optimal scale parameter of 10, and it was highest comparing with other classification classifiers, such as SVM RBF(90%), Decision Tree(90%), Bayes(90%),or KNN(87%). The classifiers had no significant effect on forest gap classification, but the fewer parameters in the classifier equation and higher speed of operation probably lead to a higher accuracy of final classifications. Our results confirm that object-based classification can extract forest gaps at a large regional scale with appropriate classification features and classifiers using LiDAR data. We note, however, that final satisfaction of forest gap classification depends on the determination of optimal scale(s) of segmentation.