As the field of autonomous driving evolves, real-time semantic segmentation has become a crucial part of computer vision tasks. However, most existing methods use lightweight convolution to reduce the computational ef...As the field of autonomous driving evolves, real-time semantic segmentation has become a crucial part of computer vision tasks. However, most existing methods use lightweight convolution to reduce the computational effort, resulting in lower accuracy. To address this problem, we construct TBANet, a network with an encoder-decoder structure for efficient feature extraction. In the encoder part, the TBA module is designed to extract details and the ETBA module is used to learn semantic representations in a high-dimensional space. In the decoder part, we design a combination of multiple upsampling methods to aggregate features with less computational overhead. We validate the efficiency of TBANet on the Cityscapes dataset. It achieves 75.1% mean Intersection over Union(mIoU) with only 2.07 million parameters and can reach 90.3 Frames Per Second(FPS).展开更多
The application of unmanned driving in the Internet of Things is one of the concrete manifestations of the application of artificial intelligence technology.Image semantic segmentation can help the unmanned driving sy...The application of unmanned driving in the Internet of Things is one of the concrete manifestations of the application of artificial intelligence technology.Image semantic segmentation can help the unmanned driving system by achieving road accessibility analysis.Semantic segmentation is also a challenging technology for image understanding and scene parsing.We focused on the challenging task of real-time semantic segmentation in this paper.In this paper,we proposed a novel fast architecture for real-time semantic segmentation named DuFNet.Starting from the existing work of Bilateral Segmentation Network(BiSeNet),DuFNet proposes a novel Semantic Information Flow(SIF)structure for context information and a novel Fringe Information Flow(FIF)structure for spatial information.We also proposed two kinds of SIF with cascaded and paralleled structures,respectively.The SIF encodes the input stage by stage in the ResNet18 backbone and provides context information for the feature fusionmodule.Features from previous stages usually contain rich low-level details but high-level semantics for later stages.Themultiple convolutions embed in Parallel SIF aggregate the corresponding features among different stages and generate a powerful global context representation with less computational cost.The FIF consists of a pooling layer and an upsampling operator followed by projection convolution layer.The concise component provides more spatial details for the network.Compared with BiSeNet,our work achieved faster speed and comparable performance with 72.34%mIoU accuracy and 78 FPS on Cityscapes Dataset based on the ResNet18 backbone.展开更多
In recent years,semantic segmentation on 3D point cloud data has attracted much attention.Unlike 2D images where pixels distribute regularly in the image domain,3D point clouds in non-Euclidean space are irregular and...In recent years,semantic segmentation on 3D point cloud data has attracted much attention.Unlike 2D images where pixels distribute regularly in the image domain,3D point clouds in non-Euclidean space are irregular and inherently sparse.Therefore,it is very difficult to extract long-range contexts and effectively aggregate local features for semantic segmentation in 3D point cloud space.Most current methods either focus on local feature aggregation or long-range context dependency,but fail to directly establish a global-local feature extractor to complete the point cloud semantic segmentation tasks.In this paper,we propose a Transformer-based stratified graph convolutional network(SGT-Net),which enlarges the effective receptive field and builds direct long-range dependency.Specifically,we first propose a novel dense-sparse sampling strategy that provides dense local vertices and sparse long-distance vertices for subsequent graph convolutional network(GCN).Secondly,we propose a multi-key self-attention mechanism based on the Transformer to further weight augmentation for crucial neighboring relationships and enlarge the effective receptive field.In addition,to further improve the efficiency of the network,we propose a similarity measurement module to determine whether the neighborhood near the center point is effective.We demonstrate the validity and superiority of our method on the S3DIS and ShapeNet datasets.Through ablation experiments and segmentation visualization,we verify that the SGT model can improve the performance of the point cloud semantic segmentation.展开更多
Few-shot semantic segmentation aims at training a model that can segment novel classes in a query image with only a few densely annotated support exemplars.It remains a challenge because of large intra-class variation...Few-shot semantic segmentation aims at training a model that can segment novel classes in a query image with only a few densely annotated support exemplars.It remains a challenge because of large intra-class variations between the support and query images.Existing approaches utilize 4D convolutions to mine semantic correspondence between the support and query images.However,they still suffer from heavy computation,sparse correspondence,and large memory.We propose axial assembled correspondence network(AACNet)to alleviate these issues.The key point of AACNet is the proposed axial assembled 4D kernel,which constructs the basic block for semantic correspondence encoder(SCE).Furthermore,we propose the deblurring equations to provide more robust correspondence for the aforementioned SCE and design a novel fusion module to mix correspondences in a learnable manner.Experiments on PASCAL-5~i reveal that our AACNet achieves a mean intersection-over-union score of 65.9%for 1-shot segmentation and 70.6%for 5-shot segmentation,surpassing the state-of-the-art method by 5.8%and 5.0%respectively.展开更多
Early detection of the Covid-19 disease is essential due to its higher rate of infection affecting tens of millions of people,and its high number of deaths also by 7%.For that purpose,a proposed model of several stage...Early detection of the Covid-19 disease is essential due to its higher rate of infection affecting tens of millions of people,and its high number of deaths also by 7%.For that purpose,a proposed model of several stages was developed.The first stage is optimizing the images using dynamic adaptive histogram equalization,performing a semantic segmentation using DeepLabv3Plus,then augmenting the data by flipping it horizontally,rotating it,then flipping it vertically.The second stage builds a custom convolutional neural network model using several pre-trained ImageNet.Finally,the model compares the pre-trained data to the new output,while repeatedly trimming the best-performing models to reduce complexity and improve memory efficiency.Several experiments were done using different techniques and parameters.Accordingly,the proposed model achieved an average accuracy of 99.6%and an area under the curve of 0.996 in the Covid-19 detection.This paper will discuss how to train a customized intelligent convolutional neural network using various parameters on a set of chest X-rays with an accuracy of 99.6%.展开更多
In recent years,the Internet of Things(IoT)has gradually developed applications such as collecting sensory data and building intelligent services,which has led to an explosion in mobile data traffic.Meanwhile,with the...In recent years,the Internet of Things(IoT)has gradually developed applications such as collecting sensory data and building intelligent services,which has led to an explosion in mobile data traffic.Meanwhile,with the rapid development of artificial intelligence,semantic communication has attracted great attention as a new communication paradigm.However,for IoT devices,however,processing image information efficiently in real time is an essential task for the rapid transmission of semantic information.With the increase of model parameters in deep learning methods,the model inference time in sensor devices continues to increase.In contrast,the Pulse Coupled Neural Network(PCNN)has fewer parameters,making it more suitable for processing real-time scene tasks such as image segmentation,which lays the foundation for real-time,effective,and accurate image transmission.However,the parameters of PCNN are determined by trial and error,which limits its application.To overcome this limitation,an Improved Pulse Coupled Neural Networks(IPCNN)model is proposed in this work.The IPCNN constructs the connection between the static properties of the input image and the dynamic properties of the neurons,and all its parameters are set adaptively,which avoids the inconvenience of manual setting in traditional methods and improves the adaptability of parameters to different types of images.Experimental segmentation results demonstrate the validity and efficiency of the proposed self-adaptive parameter setting method of IPCNN on the gray images and natural images from the Matlab and Berkeley Segmentation Datasets.The IPCNN method achieves a better segmentation result without training,providing a new solution for the real-time transmission of image semantic information.展开更多
Deep learning based methods have been successfully applied to semantic segmentation of optical remote sensing images.However,as more and more remote sensing data is available,it is a new challenge to comprehensively u...Deep learning based methods have been successfully applied to semantic segmentation of optical remote sensing images.However,as more and more remote sensing data is available,it is a new challenge to comprehensively utilize multi-modal remote sensing data to break through the performance bottleneck of single-modal interpretation.In addition,semantic segmentation and height estimation in remote sensing data are two tasks with strong correlation,but existing methods usually study individual tasks separately,which leads to high computational resource overhead.To this end,we propose a Multi-Task learning framework for Multi-Modal remote sensing images(MM_MT).Specifically,we design a Cross-Modal Feature Fusion(CMFF)method,which aggregates complementary information of different modalities to improve the accuracy of semantic segmentation and height estimation.Besides,a dual-stream multi-task learning method is introduced for Joint Semantic Segmentation and Height Estimation(JSSHE),extracting common features in a shared network to save time and resources,and then learning task-specific features in two task branches.Experimental results on the public multi-modal remote sensing image dataset Potsdam show that compared to training two tasks independently,multi-task learning saves 20%of training time and achieves competitive performance with mIoU of 83.02%for semantic segmentation and accuracy of 95.26%for height estimation.展开更多
In view of the problems of multi-scale changes of segmentation targets,noise interference,rough segmentation results and slow training process faced by medical image semantic segmentation,a multi-scale residual aggreg...In view of the problems of multi-scale changes of segmentation targets,noise interference,rough segmentation results and slow training process faced by medical image semantic segmentation,a multi-scale residual aggregation U-shaped attention network structure of MAAUNet(MultiRes aggregation attention UNet)is proposed based on MultiResUNet.Firstly,aggregate connection is introduced from the original feature aggregation at the same level.Skip connection is redesigned to aggregate features of different semantic scales at the decoder subnet,and the problem of semantic gaps is further solved that may exist between skip connections.Secondly,after the multi-scale convolution module,a convolution block attention module is added to focus and integrate features in the two attention directions of channel and space to adaptively optimize the intermediate feature map.Finally,the original convolution block is improved.The convolution channels are expanded with a series convolution structure to complement each other and extract richer spatial features.Residual connections are retained and the convolution block is turned into a multi-channel convolution block.The model is made to extract multi-scale spatial features.The experimental results show that MAAUNet has strong competitiveness in challenging datasets,and shows good segmentation performance and stability in dealing with multi-scale input and noise interference.展开更多
Currently,deep convolutional neural networks have made great progress in the field of semantic segmentation.Because of the fixed convolution kernel geometry,standard convolution neural networks have been limited the a...Currently,deep convolutional neural networks have made great progress in the field of semantic segmentation.Because of the fixed convolution kernel geometry,standard convolution neural networks have been limited the ability to simulate geometric transformations.Therefore,a deformable convolution is introduced to enhance the adaptability of convolutional networks to spatial transformation.Considering that the deep convolutional neural networks cannot adequately segment the local objects at the output layer due to using the pooling layers in neural network architecture.To overcome this shortcoming,the rough prediction segmentation results of the neural network output layer will be processed by fully connected conditional random fields to improve the ability of image segmentation.The proposed method can easily be trained by end-to-end using standard backpropagation algorithms.Finally,the proposed method is tested on the ISPRS dataset.The results show that the proposed method can effectively overcome the influence of the complex structure of the segmentation object and obtain state-of-the-art accuracy on the ISPRS Vaihingen 2D semantic labeling dataset.展开更多
Semantic segmentation is for pixel-level classification tasks,and contextual information has an important impact on the performance of segmentation.In order to capture richer contextual information,we adopt ResNet as ...Semantic segmentation is for pixel-level classification tasks,and contextual information has an important impact on the performance of segmentation.In order to capture richer contextual information,we adopt ResNet as the backbone network and designs an encoder-decoder architecture based on multidimensional attention(MDA)module and multiscale upsampling(MSU)module.The MDA module calculates the attention matrices of the three dimensions to capture the dependency of each position,and adaptively captures the image features.The MSU module adopts parallel branches to capture the multiscale features of the images,and multiscale feature aggregation can enhance contextual information.A series of experiments demonstrate the validity of the model on Cityscapes and Camvid datasets.展开更多
With the rising frequency and severity of wildfires across the globe,researchers have been actively searching for a reliable solution for early-stage forest fire detection.In recent years,Convolutional Neural Networks...With the rising frequency and severity of wildfires across the globe,researchers have been actively searching for a reliable solution for early-stage forest fire detection.In recent years,Convolutional Neural Networks(CNNs)have demonstrated outstanding performances in computer vision-based object detection tasks,including forest fire detection.Using CNNs to detect forest fires by segmenting both flame and smoke pixels not only can provide early and accurate detection but also additional information such as the size,spread,location,and movement of the fire.However,CNN-based segmentation networks are computationally demanding and can be difficult to incorporate onboard lightweight mobile platforms,such as an Uncrewed Aerial Vehicle(UAV).To address this issue,this paper has proposed a new efficient upsampling technique based on transposed convolution to make segmentation CNNs lighter.This proposed technique,named Reversed Depthwise Separable Transposed Convolution(RDSTC),achieved F1-scores of 0.78 for smoke and 0.74 for flame,outperforming U-Net networks with bilinear upsampling,transposed convolution,and CARAFE upsampling.Additionally,a Multi-signature Fire Detection Network(MsFireD-Net)has been proposed in this paper,having 93%fewer parameters and 94%fewer computations than the RDSTC U-Net.Despite being such a lightweight and efficient network,MsFireD-Net has demonstrated strong results against the other U-Net-based networks.展开更多
Cardiomyopathy is one of the most serious public health threats.The precise structural and functional cardiac measurement is an essential step for clinical diagnosis and follow-up treatment planning.Cardiologists are ...Cardiomyopathy is one of the most serious public health threats.The precise structural and functional cardiac measurement is an essential step for clinical diagnosis and follow-up treatment planning.Cardiologists are often required to draw endocardial and epicardial contours of the left ventricle(LV)manually in routine clinical diagnosis or treatment planning period.This task is time-consuming and error-prone.Therefore,it is necessary to develop a fully automated end-to-end semantic segmentation method on cardiac magnetic resonance(CMR)imaging datasets.However,due to the low image quality and the deformation caused by heartbeat,there is no effective tool for fully automated end-to-end cardiac segmentation task.In this work,we propose a multi-scale segmentation network(MSSN)for left ventricle segmentation.It can effectively learn myocardium and blood pool structure representations from 2D short-axis CMR image slices in a multi-scale way.Specifically,our method employs both parallel and serial of dilated convolution layers with different dilation rates to capture multi-scale semantic features.Moreover,we design graduated up-sampling layers with subpixel layers as the decoder to reconstruct lost spatial information and produce accurate segmentation masks.We validated our method using 164 T1 Mapping CMR images and showed that it outperforms the advanced convolutional neural network(CNN)models.In validation metrics,we archived the Dice Similarity Coefficient(DSC)metric of 78.96%.展开更多
In recent years,computer visionfinds wide applications in maritime surveillance with its sophisticated algorithms and advanced architecture.Auto-matic ship detection with computer vision techniques provide an efficien...In recent years,computer visionfinds wide applications in maritime surveillance with its sophisticated algorithms and advanced architecture.Auto-matic ship detection with computer vision techniques provide an efficient means to monitor as well as track ships in water bodies.Waterways being an important medium of transport require continuous monitoring for protection of national security.The remote sensing satellite images of ships in harbours and water bodies are the image data that aid the neural network models to localize ships and to facilitate early identification of possible threats at sea.This paper proposes a deep learning based model capable enough to classify between ships and no-ships as well as to localize ships in the original images using bounding box tech-nique.Furthermore,classified ships are again segmented with deep learning based auto-encoder model.The proposed model,in terms of classification,provides suc-cessful results generating 99.5%and 99.2%validation and training accuracy respectively.The auto-encoder model also produces 85.1%and 84.2%validation and training accuracies.Moreover the IoU metric of the segmented images is found to be of 0.77 value.The experimental results reveal that the model is accu-rate and can be implemented for automatic ship detection in water bodies consid-ering remote sensing satellite images as input to the computer vision system.展开更多
Semantic change detection is extension of change detection task in which it is not only used to identify the changed regions but also to analyze the land area semantic(labels/categories)details before and after the ti...Semantic change detection is extension of change detection task in which it is not only used to identify the changed regions but also to analyze the land area semantic(labels/categories)details before and after the timelines are analyzed.Periodical land change analysis is used for many real time applications for valuation purposes.Majority of the research works are focused on Convolutional Neural Networks(CNN)which tries to analyze changes alone.Semantic information of changes appears to be missing,there by absence of communication between the different semantic timelines and changes detected over the region happens.To overcome this limitation,a CNN network is proposed incorporating the Resnet-34 pre-trained model on Fully Convolutional Network(FCN)blocks for exploring the temporal data of satellite images in different timelines and change map between these two timelines are analyzed.Further this model achieves better results by analyzing the semantic information between the timelines and based on localized information collected from skip connections which help in generating a better change map with the categories that might have changed over a land area across timelines.Proposed model effectively examines the semantic changes such as from-to changes on land over time period.The experimental results on SECOND(Semantic Change detectiON Dataset)indicates that the proposed model yields notable improvement in performance when it is compared with the existing approaches and this also improves the semantic segmentation task on images over different timelines and the changed areas of land area across timelines.展开更多
Although deep neural networks(DNNs)have achieved great success in semantic segmentation tasks,it is still challenging for real-time applications.A large number of feature channels,parameters,and floating-point operati...Although deep neural networks(DNNs)have achieved great success in semantic segmentation tasks,it is still challenging for real-time applications.A large number of feature channels,parameters,and floating-point operations make the network sluggish and computationally heavy,which is not desirable for real-time tasks such as robotics and autonomous driving.Most approaches,however,usually sacrifice spatial resolution to achieve inference speed in real time,resulting in poor performance.In this paper,we propose a light-weight stage-pooling semantic segmentation network(SPSSN),which can efficiently reuse the paramount features from early layers at multiple stages,at different spatial resolutions.SPSSN takes input of full resolution 2048×1024 pixels,uses only 1.42×10~6 parameters,yields 69.4%m Io U accuracy without pre-training,and obtains an inference speed of 59 frames/s on the Cityscapes dataset.SPSSN can run directly on mobile devices in real time,due to its light-weight architecture.To demonstrate the effectiveness of the proposed network,we compare our results with those of state-of-the-art networks.展开更多
Existing semantic segmentation networks based on the multi-column structure can hardly satisfy the efficiency and precision requirements simultaneously due to their shallow spatial branches.In this paper,we propose a ...Existing semantic segmentation networks based on the multi-column structure can hardly satisfy the efficiency and precision requirements simultaneously due to their shallow spatial branches.In this paper,we propose a new efficient multi-column network termed as LadderNet to address this problem.Our LadderNet includes two branches where the spatial branch generates high-resolution output feature map and the context branch encodes accurate semantic information.In particular,we first propose a channel attention fusion block and a global context module to enhance the information encoding ability of the context branch.Subsequently,a new branch fusion method,i.e.,fusing some middle feature maps of the context branch into the spatial branch,is developed to improve the depth of the spatial branch.Meanwhile,we design a feature fusing module to enhance the fusion quality of these two branches,leading to a more efficient network.We compare our model with other state-of-the-arts on PASCAL VOC 2012 and Cityscapes benchmarks.Experimental results demonstrate that,compared with other state-of-the-art methods,our LadderNet can achieve average 1.25%mIoU improvement with comparable or less computation.展开更多
The accurate segmentation of retinal vessels is a challenging taskdue to the presence of various pathologies as well as the low-contrast ofthin vessels and non-uniform illumination. In recent years, encoder-decodernet...The accurate segmentation of retinal vessels is a challenging taskdue to the presence of various pathologies as well as the low-contrast ofthin vessels and non-uniform illumination. In recent years, encoder-decodernetworks have achieved outstanding performance in retinal vessel segmentation at the cost of high computational complexity. To address the aforementioned challenges and to reduce the computational complexity, we proposea lightweight convolutional neural network (CNN)-based encoder-decoderdeep learning model for accurate retinal vessels segmentation. The proposeddeep learning model consists of encoder-decoder architecture along withbottleneck layers that consist of depth-wise squeezing, followed by fullconvolution, and finally depth-wise stretching. The inspiration for the proposed model is taken from the recently developed Anam-Net model, whichwas tested on CT images for COVID-19 identification. For our lightweightmodel, we used a stack of two 3 × 3 convolution layers (without spatialpooling in between) instead of a single 3 × 3 convolution layer as proposedin Anam-Net to increase the receptive field and to reduce the trainableparameters. The proposed method includes fewer filters in all convolutionallayers than the original Anam-Net and does not have an increasing numberof filters for decreasing resolution. These modifications do not compromiseon the segmentation accuracy, but they do make the architecture significantlylighter in terms of the number of trainable parameters and computation time.The proposed architecture has comparatively fewer parameters (1.01M) thanAnam-Net (4.47M), U-Net (31.05M), SegNet (29.50M), and most of the otherrecent works. The proposed model does not require any problem-specificpre- or post-processing, nor does it rely on handcrafted features. In addition,the attribute of being efficient in terms of segmentation accuracy as well aslightweight makes the proposed method a suitable candidate to be used in thescreening platforms at the point of care. We evaluated our proposed modelon open-access datasets namely, DRIVE, STARE, and CHASE_DB. Theexperimental results show that the proposed model outperforms several stateof-the-art methods, such as U-Net and its variants, fully convolutional network (FCN), SegNet, CCNet, ResWNet, residual connection-based encoderdecoder network (RCED-Net), and scale-space approx. network (SSANet) in terms of {dice coefficient, sensitivity (SN), accuracy (ACC), and the areaunder the ROC curve (AUC)} with the scores of {0.8184, 0.8561, 0.9669, and0.9868} on the DRIVE dataset, the scores of {0.8233, 0.8581, 0.9726, and0.9901} on the STARE dataset, and the scores of {0.8138, 0.8604, 0.9752,and 0.9906} on the CHASE_DB dataset. Additionally, we perform crosstraining experiments on the DRIVE and STARE datasets. The result of thisexperiment indicates the generalization ability and robustness of the proposedmodel.展开更多
Image semantic segmentation is an essential technique for studying human behavior through image data.This paper proposes an image semantic segmentation method for human behavior research.Firstly,an end-to-end convolut...Image semantic segmentation is an essential technique for studying human behavior through image data.This paper proposes an image semantic segmentation method for human behavior research.Firstly,an end-to-end convolutional neural network architecture is proposed,which consists of a depth-separable jump-connected fully convolutional network and a conditional random field network;then jump-connected convolution is used to classify each pixel in the image,and an image semantic segmentation method based on convolu-tional neural network is proposed;and then a conditional random field network is used to improve the effect of image segmentation of hu-man behavior and a linear modeling and nonlinear modeling method based on the semantic segmentation of conditional random field im-age is proposed.Finally,using the proposed image segmentation network,the input entrepreneurial image data is semantically segmented to obtain the contour features of the person;and the segmentation of the images in the medical field.The experimental results show that the image semantic segmentation method is effective.It is a new way to use image data to study human behavior and can be extended to other research areas.展开更多
semantics information while maintaining spatial detail con-texts.Long-range context information plays a crucial role in this scenario.How-ever,the traditional convolution kernel only provides the local and small size ...semantics information while maintaining spatial detail con-texts.Long-range context information plays a crucial role in this scenario.How-ever,the traditional convolution kernel only provides the local and small size of the receptivefield.To address the problem,we propose a plug-and-play module aggregating both local and global information(aka LGIA module)to capture the high-order relationship between nodes that are far apart.We incorporate both local and global correlations into hypergraph which is able to capture high-order rela-tionships between nodes via the concept of a hyperedge connecting a subset of nodes.The local correlation considers neighborhood nodes that are spatially adja-cent and similar in the same CNN feature maps of magnetic resonance(MR)image;and the global correlation is searched from a batch of CNN feature maps of MR images in feature space.The influence of these two correlations on seman-tic segmentation is complementary.We validated our LGIA module on various CNN segmentation models with the cardiac MR images dataset.Experimental results demonstrate that our approach outperformed several baseline models.展开更多
In the context of automated analysis of eye fundus images, it is an important common fallacy that prior works achieve very high scores in segmentation of lesions, and that fallacy is fueled by some reviews reporting v...In the context of automated analysis of eye fundus images, it is an important common fallacy that prior works achieve very high scores in segmentation of lesions, and that fallacy is fueled by some reviews reporting very high scores, and perhaps some confusion with terms. A simple analysis of the detail of the few prior works that really do segmentation reveals scores between 7% and 70% in sensitivity for 1 FPI. That is clearly sub-par with medical doctors trained to detect signs of Diabetic Retinopathy, since they can distinguish well the contours of lesions in Eye Fundus Images (EFI). Still, a full segmentation of lesions could be an important step for both visualization and further automated analysis using rigorous quantification or areas and numbers of lesions to better diagnose. I discuss what prior work really does, using evidence-based analysis, and confront with segmentation networks, comparing on the terms used by prior work to show that the best performing segmentation network outperforms those prior works. I also compare architectures to understand how the network architecture influences the results. I conclude that, with the correct architecture and tuning, the semantic segmentation network improves up to 20 percentage points over prior work in the real task of segmentation of lesions. I also conclude that the network architecture and optimizations are important factors and that there are still important limitations in current work.展开更多
文摘As the field of autonomous driving evolves, real-time semantic segmentation has become a crucial part of computer vision tasks. However, most existing methods use lightweight convolution to reduce the computational effort, resulting in lower accuracy. To address this problem, we construct TBANet, a network with an encoder-decoder structure for efficient feature extraction. In the encoder part, the TBA module is designed to extract details and the ETBA module is used to learn semantic representations in a high-dimensional space. In the decoder part, we design a combination of multiple upsampling methods to aggregate features with less computational overhead. We validate the efficiency of TBANet on the Cityscapes dataset. It achieves 75.1% mean Intersection over Union(mIoU) with only 2.07 million parameters and can reach 90.3 Frames Per Second(FPS).
基金supported in part by the National Key RD Program of China (2021YFF0602104-2,2020YFB1804604)in part by the 2020 Industrial Internet Innovation and Development Project from Ministry of Industry and Information Technology of Chinain part by the Fundamental Research Fund for the Central Universities (30918012204,30920041112).
文摘The application of unmanned driving in the Internet of Things is one of the concrete manifestations of the application of artificial intelligence technology.Image semantic segmentation can help the unmanned driving system by achieving road accessibility analysis.Semantic segmentation is also a challenging technology for image understanding and scene parsing.We focused on the challenging task of real-time semantic segmentation in this paper.In this paper,we proposed a novel fast architecture for real-time semantic segmentation named DuFNet.Starting from the existing work of Bilateral Segmentation Network(BiSeNet),DuFNet proposes a novel Semantic Information Flow(SIF)structure for context information and a novel Fringe Information Flow(FIF)structure for spatial information.We also proposed two kinds of SIF with cascaded and paralleled structures,respectively.The SIF encodes the input stage by stage in the ResNet18 backbone and provides context information for the feature fusionmodule.Features from previous stages usually contain rich low-level details but high-level semantics for later stages.Themultiple convolutions embed in Parallel SIF aggregate the corresponding features among different stages and generate a powerful global context representation with less computational cost.The FIF consists of a pooling layer and an upsampling operator followed by projection convolution layer.The concise component provides more spatial details for the network.Compared with BiSeNet,our work achieved faster speed and comparable performance with 72.34%mIoU accuracy and 78 FPS on Cityscapes Dataset based on the ResNet18 backbone.
基金supported in part by the National Natural Science Foundation of China under Grant Nos.U20A20197,62306187the Foundation of Ministry of Industry and Information Technology TC220H05X-04.
文摘In recent years,semantic segmentation on 3D point cloud data has attracted much attention.Unlike 2D images where pixels distribute regularly in the image domain,3D point clouds in non-Euclidean space are irregular and inherently sparse.Therefore,it is very difficult to extract long-range contexts and effectively aggregate local features for semantic segmentation in 3D point cloud space.Most current methods either focus on local feature aggregation or long-range context dependency,but fail to directly establish a global-local feature extractor to complete the point cloud semantic segmentation tasks.In this paper,we propose a Transformer-based stratified graph convolutional network(SGT-Net),which enlarges the effective receptive field and builds direct long-range dependency.Specifically,we first propose a novel dense-sparse sampling strategy that provides dense local vertices and sparse long-distance vertices for subsequent graph convolutional network(GCN).Secondly,we propose a multi-key self-attention mechanism based on the Transformer to further weight augmentation for crucial neighboring relationships and enlarge the effective receptive field.In addition,to further improve the efficiency of the network,we propose a similarity measurement module to determine whether the neighborhood near the center point is effective.We demonstrate the validity and superiority of our method on the S3DIS and ShapeNet datasets.Through ablation experiments and segmentation visualization,we verify that the SGT model can improve the performance of the point cloud semantic segmentation.
基金supported in part by the Key Research and Development Program of Guangdong Province(2021B0101200001)the Guangdong Basic and Applied Basic Research Foundation(2020B1515120071)。
文摘Few-shot semantic segmentation aims at training a model that can segment novel classes in a query image with only a few densely annotated support exemplars.It remains a challenge because of large intra-class variations between the support and query images.Existing approaches utilize 4D convolutions to mine semantic correspondence between the support and query images.However,they still suffer from heavy computation,sparse correspondence,and large memory.We propose axial assembled correspondence network(AACNet)to alleviate these issues.The key point of AACNet is the proposed axial assembled 4D kernel,which constructs the basic block for semantic correspondence encoder(SCE).Furthermore,we propose the deblurring equations to provide more robust correspondence for the aforementioned SCE and design a novel fusion module to mix correspondences in a learnable manner.Experiments on PASCAL-5~i reveal that our AACNet achieves a mean intersection-over-union score of 65.9%for 1-shot segmentation and 70.6%for 5-shot segmentation,surpassing the state-of-the-art method by 5.8%and 5.0%respectively.
基金This work was supported by the National Research Foundation of Korea-Grant funded by the Korean Government(Ministry of Science and ICT)-NRF-2020R1A2B5B02002478).There was no additional external funding received for this study.
文摘Early detection of the Covid-19 disease is essential due to its higher rate of infection affecting tens of millions of people,and its high number of deaths also by 7%.For that purpose,a proposed model of several stages was developed.The first stage is optimizing the images using dynamic adaptive histogram equalization,performing a semantic segmentation using DeepLabv3Plus,then augmenting the data by flipping it horizontally,rotating it,then flipping it vertically.The second stage builds a custom convolutional neural network model using several pre-trained ImageNet.Finally,the model compares the pre-trained data to the new output,while repeatedly trimming the best-performing models to reduce complexity and improve memory efficiency.Several experiments were done using different techniques and parameters.Accordingly,the proposed model achieved an average accuracy of 99.6%and an area under the curve of 0.996 in the Covid-19 detection.This paper will discuss how to train a customized intelligent convolutional neural network using various parameters on a set of chest X-rays with an accuracy of 99.6%.
基金supported in part by the National Key Research and Development Program of China(Grant No.2019YFA0706200).
文摘In recent years,the Internet of Things(IoT)has gradually developed applications such as collecting sensory data and building intelligent services,which has led to an explosion in mobile data traffic.Meanwhile,with the rapid development of artificial intelligence,semantic communication has attracted great attention as a new communication paradigm.However,for IoT devices,however,processing image information efficiently in real time is an essential task for the rapid transmission of semantic information.With the increase of model parameters in deep learning methods,the model inference time in sensor devices continues to increase.In contrast,the Pulse Coupled Neural Network(PCNN)has fewer parameters,making it more suitable for processing real-time scene tasks such as image segmentation,which lays the foundation for real-time,effective,and accurate image transmission.However,the parameters of PCNN are determined by trial and error,which limits its application.To overcome this limitation,an Improved Pulse Coupled Neural Networks(IPCNN)model is proposed in this work.The IPCNN constructs the connection between the static properties of the input image and the dynamic properties of the neurons,and all its parameters are set adaptively,which avoids the inconvenience of manual setting in traditional methods and improves the adaptability of parameters to different types of images.Experimental segmentation results demonstrate the validity and efficiency of the proposed self-adaptive parameter setting method of IPCNN on the gray images and natural images from the Matlab and Berkeley Segmentation Datasets.The IPCNN method achieves a better segmentation result without training,providing a new solution for the real-time transmission of image semantic information.
基金National Key R&D Program of China(No.2022ZD0118401).
文摘Deep learning based methods have been successfully applied to semantic segmentation of optical remote sensing images.However,as more and more remote sensing data is available,it is a new challenge to comprehensively utilize multi-modal remote sensing data to break through the performance bottleneck of single-modal interpretation.In addition,semantic segmentation and height estimation in remote sensing data are two tasks with strong correlation,but existing methods usually study individual tasks separately,which leads to high computational resource overhead.To this end,we propose a Multi-Task learning framework for Multi-Modal remote sensing images(MM_MT).Specifically,we design a Cross-Modal Feature Fusion(CMFF)method,which aggregates complementary information of different modalities to improve the accuracy of semantic segmentation and height estimation.Besides,a dual-stream multi-task learning method is introduced for Joint Semantic Segmentation and Height Estimation(JSSHE),extracting common features in a shared network to save time and resources,and then learning task-specific features in two task branches.Experimental results on the public multi-modal remote sensing image dataset Potsdam show that compared to training two tasks independently,multi-task learning saves 20%of training time and achieves competitive performance with mIoU of 83.02%for semantic segmentation and accuracy of 95.26%for height estimation.
基金National Natural Science Foundation of China(No.61806006)Jiangsu University Superior Discipline Construction Project。
文摘In view of the problems of multi-scale changes of segmentation targets,noise interference,rough segmentation results and slow training process faced by medical image semantic segmentation,a multi-scale residual aggregation U-shaped attention network structure of MAAUNet(MultiRes aggregation attention UNet)is proposed based on MultiResUNet.Firstly,aggregate connection is introduced from the original feature aggregation at the same level.Skip connection is redesigned to aggregate features of different semantic scales at the decoder subnet,and the problem of semantic gaps is further solved that may exist between skip connections.Secondly,after the multi-scale convolution module,a convolution block attention module is added to focus and integrate features in the two attention directions of channel and space to adaptively optimize the intermediate feature map.Finally,the original convolution block is improved.The convolution channels are expanded with a series convolution structure to complement each other and extract richer spatial features.Residual connections are retained and the convolution block is turned into a multi-channel convolution block.The model is made to extract multi-scale spatial features.The experimental results show that MAAUNet has strong competitiveness in challenging datasets,and shows good segmentation performance and stability in dealing with multi-scale input and noise interference.
基金National Key Research and Development Program of China(No.2017YFC0405806)。
文摘Currently,deep convolutional neural networks have made great progress in the field of semantic segmentation.Because of the fixed convolution kernel geometry,standard convolution neural networks have been limited the ability to simulate geometric transformations.Therefore,a deformable convolution is introduced to enhance the adaptability of convolutional networks to spatial transformation.Considering that the deep convolutional neural networks cannot adequately segment the local objects at the output layer due to using the pooling layers in neural network architecture.To overcome this shortcoming,the rough prediction segmentation results of the neural network output layer will be processed by fully connected conditional random fields to improve the ability of image segmentation.The proposed method can easily be trained by end-to-end using standard backpropagation algorithms.Finally,the proposed method is tested on the ISPRS dataset.The results show that the proposed method can effectively overcome the influence of the complex structure of the segmentation object and obtain state-of-the-art accuracy on the ISPRS Vaihingen 2D semantic labeling dataset.
基金Fundamental Research Fund in Heilongjiang Provincial Universities(Nos.135409602,135409102)。
文摘Semantic segmentation is for pixel-level classification tasks,and contextual information has an important impact on the performance of segmentation.In order to capture richer contextual information,we adopt ResNet as the backbone network and designs an encoder-decoder architecture based on multidimensional attention(MDA)module and multiscale upsampling(MSU)module.The MDA module calculates the attention matrices of the three dimensions to capture the dependency of each position,and adaptively captures the image features.The MSU module adopts parallel branches to capture the multiscale features of the images,and multiscale feature aggregation can enhance contextual information.A series of experiments demonstrate the validity of the model on Cityscapes and Camvid datasets.
文摘With the rising frequency and severity of wildfires across the globe,researchers have been actively searching for a reliable solution for early-stage forest fire detection.In recent years,Convolutional Neural Networks(CNNs)have demonstrated outstanding performances in computer vision-based object detection tasks,including forest fire detection.Using CNNs to detect forest fires by segmenting both flame and smoke pixels not only can provide early and accurate detection but also additional information such as the size,spread,location,and movement of the fire.However,CNN-based segmentation networks are computationally demanding and can be difficult to incorporate onboard lightweight mobile platforms,such as an Uncrewed Aerial Vehicle(UAV).To address this issue,this paper has proposed a new efficient upsampling technique based on transposed convolution to make segmentation CNNs lighter.This proposed technique,named Reversed Depthwise Separable Transposed Convolution(RDSTC),achieved F1-scores of 0.78 for smoke and 0.74 for flame,outperforming U-Net networks with bilinear upsampling,transposed convolution,and CARAFE upsampling.Additionally,a Multi-signature Fire Detection Network(MsFireD-Net)has been proposed in this paper,having 93%fewer parameters and 94%fewer computations than the RDSTC U-Net.Despite being such a lightweight and efficient network,MsFireD-Net has demonstrated strong results against the other U-Net-based networks.
基金This work was supported by the Project of Sichuan Outstanding Young Scientific and Technological Talents(19JCQN0003)the major Project of Education Department in Sichuan(17ZA0063 and 2017JQ0030)+1 种基金in part by the Natural Science Foundation for Young Scientists of CUIT(J201704)the Sichuan Science and Technology Program(2019JDRC0077).
文摘Cardiomyopathy is one of the most serious public health threats.The precise structural and functional cardiac measurement is an essential step for clinical diagnosis and follow-up treatment planning.Cardiologists are often required to draw endocardial and epicardial contours of the left ventricle(LV)manually in routine clinical diagnosis or treatment planning period.This task is time-consuming and error-prone.Therefore,it is necessary to develop a fully automated end-to-end semantic segmentation method on cardiac magnetic resonance(CMR)imaging datasets.However,due to the low image quality and the deformation caused by heartbeat,there is no effective tool for fully automated end-to-end cardiac segmentation task.In this work,we propose a multi-scale segmentation network(MSSN)for left ventricle segmentation.It can effectively learn myocardium and blood pool structure representations from 2D short-axis CMR image slices in a multi-scale way.Specifically,our method employs both parallel and serial of dilated convolution layers with different dilation rates to capture multi-scale semantic features.Moreover,we design graduated up-sampling layers with subpixel layers as the decoder to reconstruct lost spatial information and produce accurate segmentation masks.We validated our method using 164 T1 Mapping CMR images and showed that it outperforms the advanced convolutional neural network(CNN)models.In validation metrics,we archived the Dice Similarity Coefficient(DSC)metric of 78.96%.
文摘In recent years,computer visionfinds wide applications in maritime surveillance with its sophisticated algorithms and advanced architecture.Auto-matic ship detection with computer vision techniques provide an efficient means to monitor as well as track ships in water bodies.Waterways being an important medium of transport require continuous monitoring for protection of national security.The remote sensing satellite images of ships in harbours and water bodies are the image data that aid the neural network models to localize ships and to facilitate early identification of possible threats at sea.This paper proposes a deep learning based model capable enough to classify between ships and no-ships as well as to localize ships in the original images using bounding box tech-nique.Furthermore,classified ships are again segmented with deep learning based auto-encoder model.The proposed model,in terms of classification,provides suc-cessful results generating 99.5%and 99.2%validation and training accuracy respectively.The auto-encoder model also produces 85.1%and 84.2%validation and training accuracies.Moreover the IoU metric of the segmented images is found to be of 0.77 value.The experimental results reveal that the model is accu-rate and can be implemented for automatic ship detection in water bodies consid-ering remote sensing satellite images as input to the computer vision system.
文摘Semantic change detection is extension of change detection task in which it is not only used to identify the changed regions but also to analyze the land area semantic(labels/categories)details before and after the timelines are analyzed.Periodical land change analysis is used for many real time applications for valuation purposes.Majority of the research works are focused on Convolutional Neural Networks(CNN)which tries to analyze changes alone.Semantic information of changes appears to be missing,there by absence of communication between the different semantic timelines and changes detected over the region happens.To overcome this limitation,a CNN network is proposed incorporating the Resnet-34 pre-trained model on Fully Convolutional Network(FCN)blocks for exploring the temporal data of satellite images in different timelines and change map between these two timelines are analyzed.Further this model achieves better results by analyzing the semantic information between the timelines and based on localized information collected from skip connections which help in generating a better change map with the categories that might have changed over a land area across timelines.Proposed model effectively examines the semantic changes such as from-to changes on land over time period.The experimental results on SECOND(Semantic Change detectiON Dataset)indicates that the proposed model yields notable improvement in performance when it is compared with the existing approaches and this also improves the semantic segmentation task on images over different timelines and the changed areas of land area across timelines.
基金Project supported by the National Key R&D Program of China(No.2017YFB1300205)。
文摘Although deep neural networks(DNNs)have achieved great success in semantic segmentation tasks,it is still challenging for real-time applications.A large number of feature channels,parameters,and floating-point operations make the network sluggish and computationally heavy,which is not desirable for real-time tasks such as robotics and autonomous driving.Most approaches,however,usually sacrifice spatial resolution to achieve inference speed in real time,resulting in poor performance.In this paper,we propose a light-weight stage-pooling semantic segmentation network(SPSSN),which can efficiently reuse the paramount features from early layers at multiple stages,at different spatial resolutions.SPSSN takes input of full resolution 2048×1024 pixels,uses only 1.42×10~6 parameters,yields 69.4%m Io U accuracy without pre-training,and obtains an inference speed of 59 frames/s on the Cityscapes dataset.SPSSN can run directly on mobile devices in real time,due to its light-weight architecture.To demonstrate the effectiveness of the proposed network,we compare our results with those of state-of-the-art networks.
基金the National Natural Science Foundation of China under Grant No.61773295。
文摘Existing semantic segmentation networks based on the multi-column structure can hardly satisfy the efficiency and precision requirements simultaneously due to their shallow spatial branches.In this paper,we propose a new efficient multi-column network termed as LadderNet to address this problem.Our LadderNet includes two branches where the spatial branch generates high-resolution output feature map and the context branch encodes accurate semantic information.In particular,we first propose a channel attention fusion block and a global context module to enhance the information encoding ability of the context branch.Subsequently,a new branch fusion method,i.e.,fusing some middle feature maps of the context branch into the spatial branch,is developed to improve the depth of the spatial branch.Meanwhile,we design a feature fusing module to enhance the fusion quality of these two branches,leading to a more efficient network.We compare our model with other state-of-the-arts on PASCAL VOC 2012 and Cityscapes benchmarks.Experimental results demonstrate that,compared with other state-of-the-art methods,our LadderNet can achieve average 1.25%mIoU improvement with comparable or less computation.
基金The authors extend their appreciation to the Deputyship for Research and Innovation,Ministry of Education in Saudi Arabia for funding this research work through the project number(DRI−KSU−415).
文摘The accurate segmentation of retinal vessels is a challenging taskdue to the presence of various pathologies as well as the low-contrast ofthin vessels and non-uniform illumination. In recent years, encoder-decodernetworks have achieved outstanding performance in retinal vessel segmentation at the cost of high computational complexity. To address the aforementioned challenges and to reduce the computational complexity, we proposea lightweight convolutional neural network (CNN)-based encoder-decoderdeep learning model for accurate retinal vessels segmentation. The proposeddeep learning model consists of encoder-decoder architecture along withbottleneck layers that consist of depth-wise squeezing, followed by fullconvolution, and finally depth-wise stretching. The inspiration for the proposed model is taken from the recently developed Anam-Net model, whichwas tested on CT images for COVID-19 identification. For our lightweightmodel, we used a stack of two 3 × 3 convolution layers (without spatialpooling in between) instead of a single 3 × 3 convolution layer as proposedin Anam-Net to increase the receptive field and to reduce the trainableparameters. The proposed method includes fewer filters in all convolutionallayers than the original Anam-Net and does not have an increasing numberof filters for decreasing resolution. These modifications do not compromiseon the segmentation accuracy, but they do make the architecture significantlylighter in terms of the number of trainable parameters and computation time.The proposed architecture has comparatively fewer parameters (1.01M) thanAnam-Net (4.47M), U-Net (31.05M), SegNet (29.50M), and most of the otherrecent works. The proposed model does not require any problem-specificpre- or post-processing, nor does it rely on handcrafted features. In addition,the attribute of being efficient in terms of segmentation accuracy as well aslightweight makes the proposed method a suitable candidate to be used in thescreening platforms at the point of care. We evaluated our proposed modelon open-access datasets namely, DRIVE, STARE, and CHASE_DB. Theexperimental results show that the proposed model outperforms several stateof-the-art methods, such as U-Net and its variants, fully convolutional network (FCN), SegNet, CCNet, ResWNet, residual connection-based encoderdecoder network (RCED-Net), and scale-space approx. network (SSANet) in terms of {dice coefficient, sensitivity (SN), accuracy (ACC), and the areaunder the ROC curve (AUC)} with the scores of {0.8184, 0.8561, 0.9669, and0.9868} on the DRIVE dataset, the scores of {0.8233, 0.8581, 0.9726, and0.9901} on the STARE dataset, and the scores of {0.8138, 0.8604, 0.9752,and 0.9906} on the CHASE_DB dataset. Additionally, we perform crosstraining experiments on the DRIVE and STARE datasets. The result of thisexperiment indicates the generalization ability and robustness of the proposedmodel.
基金Supported by the Major Consulting and Research Project of the Chinese Academy of Engineering(2020-CQ-ZD-1)the National Natural Science Foundation of China(72101235)Zhejiang Soft Science Research Program(2023C35012)。
文摘Image semantic segmentation is an essential technique for studying human behavior through image data.This paper proposes an image semantic segmentation method for human behavior research.Firstly,an end-to-end convolutional neural network architecture is proposed,which consists of a depth-separable jump-connected fully convolutional network and a conditional random field network;then jump-connected convolution is used to classify each pixel in the image,and an image semantic segmentation method based on convolu-tional neural network is proposed;and then a conditional random field network is used to improve the effect of image segmentation of hu-man behavior and a linear modeling and nonlinear modeling method based on the semantic segmentation of conditional random field im-age is proposed.Finally,using the proposed image segmentation network,the input entrepreneurial image data is semantically segmented to obtain the contour features of the person;and the segmentation of the images in the medical field.The experimental results show that the image semantic segmentation method is effective.It is a new way to use image data to study human behavior and can be extended to other research areas.
基金supported by the Sichuan Science and Technology Program(Grant No.2019ZDZX0005,2019YFG0496,2020YFG0143,2019JDJQ0002 and 2020YFG0009).
文摘semantics information while maintaining spatial detail con-texts.Long-range context information plays a crucial role in this scenario.How-ever,the traditional convolution kernel only provides the local and small size of the receptivefield.To address the problem,we propose a plug-and-play module aggregating both local and global information(aka LGIA module)to capture the high-order relationship between nodes that are far apart.We incorporate both local and global correlations into hypergraph which is able to capture high-order rela-tionships between nodes via the concept of a hyperedge connecting a subset of nodes.The local correlation considers neighborhood nodes that are spatially adja-cent and similar in the same CNN feature maps of magnetic resonance(MR)image;and the global correlation is searched from a batch of CNN feature maps of MR images in feature space.The influence of these two correlations on seman-tic segmentation is complementary.We validated our LGIA module on various CNN segmentation models with the cardiac MR images dataset.Experimental results demonstrate that our approach outperformed several baseline models.
文摘In the context of automated analysis of eye fundus images, it is an important common fallacy that prior works achieve very high scores in segmentation of lesions, and that fallacy is fueled by some reviews reporting very high scores, and perhaps some confusion with terms. A simple analysis of the detail of the few prior works that really do segmentation reveals scores between 7% and 70% in sensitivity for 1 FPI. That is clearly sub-par with medical doctors trained to detect signs of Diabetic Retinopathy, since they can distinguish well the contours of lesions in Eye Fundus Images (EFI). Still, a full segmentation of lesions could be an important step for both visualization and further automated analysis using rigorous quantification or areas and numbers of lesions to better diagnose. I discuss what prior work really does, using evidence-based analysis, and confront with segmentation networks, comparing on the terms used by prior work to show that the best performing segmentation network outperforms those prior works. I also compare architectures to understand how the network architecture influences the results. I conclude that, with the correct architecture and tuning, the semantic segmentation network improves up to 20 percentage points over prior work in the real task of segmentation of lesions. I also conclude that the network architecture and optimizations are important factors and that there are still important limitations in current work.