With the rapid development of artificial intelligence and the widespread use of the Internet of Things, semantic communication, as an emerging communication paradigm, has been attracting great interest. Taking image t...With the rapid development of artificial intelligence and the widespread use of the Internet of Things, semantic communication, as an emerging communication paradigm, has been attracting great interest. Taking image transmission as an example, from the semantic communication's view, not all pixels in the images are equally important for certain receivers. The existing semantic communication systems directly perform semantic encoding and decoding on the whole image, in which the region of interest cannot be identified. In this paper, we propose a novel semantic communication system for image transmission that can distinguish between Regions Of Interest (ROI) and Regions Of Non-Interest (RONI) based on semantic segmentation, where a semantic segmentation algorithm is used to classify each pixel of the image and distinguish ROI and RONI. The system also enables high-quality transmission of ROI with lower communication overheads by transmissions through different semantic communication networks with different bandwidth requirements. An improved metric θPSNR is proposed to evaluate the transmission accuracy of the novel semantic transmission network. Experimental results show that our proposed system achieves a significant performance improvement compared with existing approaches, namely, existing semantic communication approaches and the conventional approach without semantics.展开更多
With the development of underwater sonar detection technology,simultaneous localization and mapping(SLAM)approach has attracted much attention in underwater navigation field in recent years.But the weak detection abil...With the development of underwater sonar detection technology,simultaneous localization and mapping(SLAM)approach has attracted much attention in underwater navigation field in recent years.But the weak detection ability of a single vehicle limits the SLAM performance in wide areas.Thereby,cooperative SLAM using multiple vehicles has become an important research direction.The key factor of cooperative SLAM is timely and efficient sonar image transmission among underwater vehicles.However,the limited bandwidth of underwater acoustic channels contradicts a large amount of sonar image data.It is essential to compress the images before transmission.Recently,deep neural networks have great value in image compression by virtue of the powerful learning ability of neural networks,but the existing sonar image compression methods based on neural network usually focus on the pixel-level information without the semantic-level information.In this paper,we propose a novel underwater acoustic transmission scheme called UAT-SSIC that includes semantic segmentation-based sonar image compression(SSIC)framework and the joint source-channel codec,to improve the accuracy of the semantic information of the reconstructed sonar image at the receiver.The SSIC framework consists of Auto-Encoder structure-based sonar image compression network,which is measured by a semantic segmentation network's residual.Considering that sonar images have the characteristics of blurred target edges,the semantic segmentation network used a special dilated convolution neural network(DiCNN)to enhance segmentation accuracy by expanding the range of receptive fields.The joint source-channel codec with unequal error protection is proposed that adjusts the power level of the transmitted data,which deal with sonar image transmission error caused by the serious underwater acoustic channel.Experiment results demonstrate that our method preserves more semantic information,with advantages over existing methods at the same compression ratio.It also improves the error tolerance and packet loss resistance of transmission.展开更多
Semantic segmentation of driving scene images is crucial for autonomous driving.While deep learning technology has significantly improved daytime image semantic segmentation,nighttime images pose challenges due to fac...Semantic segmentation of driving scene images is crucial for autonomous driving.While deep learning technology has significantly improved daytime image semantic segmentation,nighttime images pose challenges due to factors like poor lighting and overexposure,making it difficult to recognize small objects.To address this,we propose an Image Adaptive Enhancement(IAEN)module comprising a parameter predictor(Edip),multiple image processing filters(Mdif),and a Detail Processing Module(DPM).Edip combines image processing filters to predict parameters like exposure and hue,optimizing image quality.We adopt a novel image encoder to enhance parameter prediction accuracy by enabling Edip to handle features at different scales.DPM strengthens overlooked image details,extending the IAEN module’s functionality.After the segmentation network,we integrate a Depth Guided Filter(DGF)to refine segmentation outputs.The entire network is trained end-to-end,with segmentation results guiding parameter prediction optimization,promoting self-learning and network improvement.This lightweight and efficient network architecture is particularly suitable for addressing challenges in nighttime image segmentation.Extensive experiments validate significant performance improvements of our approach on the ACDC-night and Nightcity datasets.展开更多
Computed Tomography(CT)is a commonly used technology in Printed Circuit Boards(PCB)non-destructive testing,and element segmentation of CT images is a key subsequent step.With the development of deep learning,researche...Computed Tomography(CT)is a commonly used technology in Printed Circuit Boards(PCB)non-destructive testing,and element segmentation of CT images is a key subsequent step.With the development of deep learning,researchers began to exploit the“pre-training and fine-tuning”training process for multi-element segmentation,reducing the time spent on manual annotation.However,the existing element segmentation model only focuses on the overall accuracy at the pixel level,ignoring whether the element connectivity relationship can be correctly identified.To this end,this paper proposes a PCB CT image element segmentation model optimizing the semantic perception of connectivity relationship(OSPC-seg).The overall training process adopts a“pre-training and fine-tuning”training process.A loss function that optimizes the semantic perception of circuit connectivity relationship(OSPC Loss)is designed from the aspect of alleviating the class imbalance problem and improving the correct connectivity rate.Also,the correct connectivity rate index(CCR)is proposed to evaluate the model’s connectivity relationship recognition capabilities.Experiments show that mIoU and CCR of OSPC-seg on our datasets are 90.1%and 97.0%,improved by 1.5%and 1.6%respectively compared with the baseline model.From visualization results,it can be seen that the segmentation performance of connection positions is significantly improved,which also demonstrates the effectiveness of OSPC-seg.展开更多
Because pixel values of foggy images are irregularly higher than those of images captured in normal weather(clear images),it is difficult to extract and express their texture.No method has previously been developed to...Because pixel values of foggy images are irregularly higher than those of images captured in normal weather(clear images),it is difficult to extract and express their texture.No method has previously been developed to directly explore the relationship between foggy images and semantic segmentation images.We investigated this relationship and propose a generative adversarial network(GAN)for foggy image semantic segmentation(FISS GAN),which contains two parts:an edge GAN and a semantic segmentation GAN.The edge GAN is designed to generate edge information from foggy images to provide auxiliary information to the semantic segmentation GAN.The semantic segmentation GAN is designed to extract and express the texture of foggy images and generate semantic segmentation images.Experiments on foggy cityscapes datasets and foggy driving datasets indicated that FISS GAN achieved state-of-the-art performance.展开更多
Semantic segmentation is a crucial step for document understanding.In this paper,an NVIDIA Jetson Nano-based platform is applied for implementing semantic segmentation for teaching artificial intelligence concepts and...Semantic segmentation is a crucial step for document understanding.In this paper,an NVIDIA Jetson Nano-based platform is applied for implementing semantic segmentation for teaching artificial intelligence concepts and programming.To extract semantic structures from document images,we present an end-to-end dilated convolution network architecture.Dilated convolutions have well-known advantages for extracting multi-scale context information without losing spatial resolution.Our model utilizes dilated convolutions with residual network to represent the image features and predicting pixel labels.The convolution part works as feature extractor to obtain multidimensional and hierarchical image features.The consecutive deconvolution is used for producing full resolution segmentation prediction.The probability of each pixel decides its predefined semantic class label.To understand segmentation granularity,we compare performances at three different levels.From fine grained class to coarse class levels,the proposed dilated convolution network architecture is evaluated on three document datasets.The experimental results have shown that both semantic data distribution imbalance and network depth are import factors that influence the document’s semantic segmentation performances.The research is aimed at offering an education resource for teaching artificial intelligence concepts and techniques.展开更多
This letter presents an efficient and simple image segmentation method for semantic object spatial segmentation. First, the image is filtered using contour-preserving filters. Then it is quasi-flat labeled. The small ...This letter presents an efficient and simple image segmentation method for semantic object spatial segmentation. First, the image is filtered using contour-preserving filters. Then it is quasi-flat labeled. The small regions near the contour are classified as uncertain regions and are eliminated by region growing and merging. Further region merging is used to reduce the region number. The simulation results show its efficiency and simplicity. It can preserve the semantic object shape while emphasize on the perceptual complex part of the object. So it conforms to the human visual perception very well.展开更多
In view of the problems of multi-scale changes of segmentation targets,noise interference,rough segmentation results and slow training process faced by medical image semantic segmentation,a multi-scale residual aggreg...In view of the problems of multi-scale changes of segmentation targets,noise interference,rough segmentation results and slow training process faced by medical image semantic segmentation,a multi-scale residual aggregation U-shaped attention network structure of MAAUNet(MultiRes aggregation attention UNet)is proposed based on MultiResUNet.Firstly,aggregate connection is introduced from the original feature aggregation at the same level.Skip connection is redesigned to aggregate features of different semantic scales at the decoder subnet,and the problem of semantic gaps is further solved that may exist between skip connections.Secondly,after the multi-scale convolution module,a convolution block attention module is added to focus and integrate features in the two attention directions of channel and space to adaptively optimize the intermediate feature map.Finally,the original convolution block is improved.The convolution channels are expanded with a series convolution structure to complement each other and extract richer spatial features.Residual connections are retained and the convolution block is turned into a multi-channel convolution block.The model is made to extract multi-scale spatial features.The experimental results show that MAAUNet has strong competitiveness in challenging datasets,and shows good segmentation performance and stability in dealing with multi-scale input and noise interference.展开更多
Currently,deep convolutional neural networks have made great progress in the field of semantic segmentation.Because of the fixed convolution kernel geometry,standard convolution neural networks have been limited the a...Currently,deep convolutional neural networks have made great progress in the field of semantic segmentation.Because of the fixed convolution kernel geometry,standard convolution neural networks have been limited the ability to simulate geometric transformations.Therefore,a deformable convolution is introduced to enhance the adaptability of convolutional networks to spatial transformation.Considering that the deep convolutional neural networks cannot adequately segment the local objects at the output layer due to using the pooling layers in neural network architecture.To overcome this shortcoming,the rough prediction segmentation results of the neural network output layer will be processed by fully connected conditional random fields to improve the ability of image segmentation.The proposed method can easily be trained by end-to-end using standard backpropagation algorithms.Finally,the proposed method is tested on the ISPRS dataset.The results show that the proposed method can effectively overcome the influence of the complex structure of the segmentation object and obtain state-of-the-art accuracy on the ISPRS Vaihingen 2D semantic labeling dataset.展开更多
There are two types of methods for image segmentation.One is traditional image processing methods,which are sensitive to details and boundaries,yet fail to recognize semantic information.The other is deep learning met...There are two types of methods for image segmentation.One is traditional image processing methods,which are sensitive to details and boundaries,yet fail to recognize semantic information.The other is deep learning methods,which can locate and identify different objects,but boundary identifications are not accurate enough.Both of them cannot generate entire segmentation information.In order to obtain accurate edge detection and semantic information,an Adaptive Boundary and Semantic Composite Segmentation method(ABSCS)is proposed.This method can precisely semantic segment individual objects in large-size aerial images with limited GPU performances.It includes adaptively dividing and modifying the aerial images with the proposed principles and methods,using the deep learning method to semantic segment and preprocess the small divided pieces,using three traditional methods to segment and preprocess original-size aerial images,adaptively selecting traditional results tomodify the boundaries of individual objects in deep learning results,and combining the results of different objects.Individual object semantic segmentation experiments are conducted by using the AeroScapes dataset,and their results are analyzed qualitatively and quantitatively.The experimental results demonstrate that the proposed method can achieve more promising object boundaries than the original deep learning method.This work also demonstrates the advantages of the proposed method in applications of point cloud semantic segmentation and image inpainting.展开更多
To resist the risk of the stego-image being maliciously altered during transmission,we propose a coverless image steganography method based on image segmentation.Most existing coverless steganography methods are based...To resist the risk of the stego-image being maliciously altered during transmission,we propose a coverless image steganography method based on image segmentation.Most existing coverless steganography methods are based on whole feature mapping,which has poor robustness when facing geometric attacks,because the contents in the image are easy to lost.To solve this problem,we use ResNet to extract semantic features,and segment the object areas from the image through Mask RCNN for information hiding.These selected object areas have ethical structural integrity and are not located in the visual center of the image,reducing the information loss of malicious attacks.Then,these object areas will be binarized to generate hash sequences for information mapping.In transmission,only a set of stego-images unrelated to the secret information are transmitted,so it can fundamentally resist steganalysis.At the same time,since both Mask RCNN and ResNet have excellent robustness,pre-training the model through supervised learning can achieve good performance.The robust hash algorithm can also resist attacks during transmission.Although image segmentation will reduce the capacity,multiple object areas can be extracted from an image to ensure the capacity to a certain extent.Experimental results show that compared with other coverless image steganography methods,our method is more robust when facing geometric attacks.展开更多
The semantic segmentation methods based on CNN have made great progress,but there are still some shortcomings in the application of remote sensing images segmentation,such as the small receptive field can not effectiv...The semantic segmentation methods based on CNN have made great progress,but there are still some shortcomings in the application of remote sensing images segmentation,such as the small receptive field can not effectively capture global context.In order to solve this problem,this paper proposes a hybrid model based on ResNet50 and swin transformer to directly capture long-range dependence,which fuses features through Cross Feature Modulation Module(CFMM).Experimental results on two publicly available datasets,Vaihingen and Potsdam,are mIoU of 70.27%and 76.63%,respectively.Thus,CFM-UNet can maintain a high segmentation performance compared with other competitive networks.展开更多
In recent years,multimedia annotation problem has been attracting significant research attention in multimedia and computer vision areas,especially for automatic image annotation,whose purpose is to provide an efficie...In recent years,multimedia annotation problem has been attracting significant research attention in multimedia and computer vision areas,especially for automatic image annotation,whose purpose is to provide an efficient and effective searching environment for users to query their images more easily. In this paper,a semi-supervised learning based probabilistic latent semantic analysis( PLSA) model for automatic image annotation is presenred. Since it's often hard to obtain or create labeled images in large quantities while unlabeled ones are easier to collect,a transductive support vector machine( TSVM) is exploited to enhance the quality of the training image data. Then,different image features with different magnitudes will result in different performance for automatic image annotation. To this end,a Gaussian normalization method is utilized to normalize different features extracted from effective image regions segmented by the normalized cuts algorithm so as to reserve the intrinsic content of images as complete as possible. Finally,a PLSA model with asymmetric modalities is constructed based on the expectation maximization( EM) algorithm to predict a candidate set of annotations with confidence scores. Extensive experiments on the general-purpose Corel5k dataset demonstrate that the proposed model can significantly improve performance of traditional PLSA for the task of automatic image annotation.展开更多
Automatic segmentation of early esophagus cancer(EEC)in gastrointestinal endoscopy(GIE)images is a critical and challenging task in clinical settings,which relies primarily on labor-intensive and time-consuming routin...Automatic segmentation of early esophagus cancer(EEC)in gastrointestinal endoscopy(GIE)images is a critical and challenging task in clinical settings,which relies primarily on labor-intensive and time-consuming routines.EEC has often been diagnosed at the late stage since early signs of cancer are not obvious,resulting in low survival rates.This work proposes a deep learning approach based on the U-Net++method to segment EEC in GIE images.A total of 2690 GIE images collected from 617 patients at the Digestive Endoscopy Center,West China Hospital of Sichuan University,China,have been utilized.The experimental result shows that our proposed method achieved promising results.Furthermore,the comparison has been made between the proposed and other U-Net-related methods using the same dataset.The mean and standard deviation(SD)of the dice similarity coefficient(DSC),intersection over union(IoU),precision(Pre),and recall(Rec)achieved by the proposed framework were DSC(%)=94.62±0.02,IoU(%)=90.99±0.04,Pre(%)=94.61±0.04,and Rec(%)=95.00±0.02,respectively,outperforming the others.The proposed method has the potential to be applied in EEC automatic diagnoses.展开更多
This paper proposes an improved high-precision 3D semantic mapping method for indoor scenes using RGB-D images.The current semantic mapping algorithms suffer from low semantic annotation accuracy and insufficient real...This paper proposes an improved high-precision 3D semantic mapping method for indoor scenes using RGB-D images.The current semantic mapping algorithms suffer from low semantic annotation accuracy and insufficient real-time performance.To address these issues,we first adopt the Elastic Fusion algorithm to select key frames from indoor environment image sequences captured by the Kinect sensor and construct the indoor environment space model.Then,an indoor RGB-D image semantic segmentation network is proposed,which uses multi-scale feature fusion to quickly and accurately obtain object labeling information at the pixel level of the spatial point cloud model.Finally,Bayesian updating is used to conduct incremental semantic label fusion on the established spatial point cloud model.We also employ dense conditional random fields(CRF)to optimize the 3D semantic map model,resulting in a high-precision spatial semantic map of indoor scenes.Experimental results show that the proposed semantic mapping system can process image sequences collected by RGB-D sensors in real-time and output accurate semantic segmentation results of indoor scene images and the current local spatial semantic map.Finally,it constructs a globally consistent high-precision indoor scenes 3D semantic map.展开更多
In order to accurately segment architectural features in highresolution remote sensing images,a semantic segmentation method based on U-net network multi-task learning is proposed.First,a boundary distance map was gen...In order to accurately segment architectural features in highresolution remote sensing images,a semantic segmentation method based on U-net network multi-task learning is proposed.First,a boundary distance map was generated based on the remote sensing image of the ground truth map of the building.The remote sensing image and its truth map were used as the input in the U-net network,followed by the addition of the building ground prediction layer at the end of the U-net network.Based on the ResNet network,a multi-task network with the boundary distance prediction layer was built.Experiments involving the ISPRS aerial remote sensing image building and feature annotation data set show that compared with the full convolutional network combined with the multi-layer perceptron method,the intersection ratio of VGG16 network,VGG16+boundary prediction,ResNet50 and the method in this paper were increased by 5.15%,6.946%,6.41%and 7.86%.The accuracy of the networks was increased to 94.71%,95.39%,95.30%and 96.10%respectively,which resulted in high-precision extraction of building features.展开更多
Significant advancements have been achieved in road surface extraction based on high-resolution remote sensingimage processing. Most current methods rely on fully supervised learning, which necessitates enormous human...Significant advancements have been achieved in road surface extraction based on high-resolution remote sensingimage processing. Most current methods rely on fully supervised learning, which necessitates enormous humaneffort to label the image. Within this field, other research endeavors utilize weakly supervised methods. Theseapproaches aim to reduce the expenses associated with annotation by leveraging sparsely annotated data, such asscribbles. This paper presents a novel technique called a weakly supervised network using scribble-supervised andedge-mask (WSSE-net). This network is a three-branch network architecture, whereby each branch is equippedwith a distinct decoder module dedicated to road extraction tasks. One of the branches is dedicated to generatingedge masks using edge detection algorithms and optimizing road edge details. The other two branches supervise themodel’s training by employing scribble labels and spreading scribble information throughout the image. To addressthe historical flaw that created pseudo-labels that are not updated with network training, we use mixup to blendprediction results dynamically and continually update new pseudo-labels to steer network training. Our solutiondemonstrates efficient operation by simultaneously considering both edge-mask aid and dynamic pseudo-labelsupport. The studies are conducted on three separate road datasets, which consist primarily of high-resolutionremote-sensing satellite photos and drone images. The experimental findings suggest that our methodologyperforms better than advanced scribble-supervised approaches and specific traditional fully supervised methods.展开更多
We propose a novel image segmentation algorithm to tackle the challenge of limited recognition and segmentation performance in identifying welding seam images during robotic intelligent operations.Initially,to enhance...We propose a novel image segmentation algorithm to tackle the challenge of limited recognition and segmentation performance in identifying welding seam images during robotic intelligent operations.Initially,to enhance the capability of deep neural networks in extracting geometric attributes from depth images,we developed a novel deep geometric convolution operator(DGConv).DGConv is utilized to construct a deep local geometric feature extraction module,facilitating a more comprehensive exploration of the intrinsic geometric information within depth images.Secondly,we integrate the newly proposed deep geometric feature module with the Fully Convolutional Network(FCN8)to establish a high-performance deep neural network algorithm tailored for depth image segmentation.Concurrently,we enhance the FCN8 detection head by separating the segmentation and classification processes.This enhancement significantly boosts the network’s overall detection capability.Thirdly,for a comprehensive assessment of our proposed algorithm and its applicability in real-world industrial settings,we curated a line-scan image dataset featuring weld seams.This dataset,named the Standardized Linear Depth Profile(SLDP)dataset,was collected from actual industrial sites where autonomous robots are in operation.Ultimately,we conducted experiments utilizing the SLDP dataset,achieving an average accuracy of 92.7%.Our proposed approach exhibited a remarkable performance improvement over the prior method on the identical dataset.Moreover,we have successfully deployed the proposed algorithm in genuine industrial environments,fulfilling the prerequisites of unmanned robot operations.展开更多
In recent years,the Internet of Things(IoT)has gradually developed applications such as collecting sensory data and building intelligent services,which has led to an explosion in mobile data traffic.Meanwhile,with the...In recent years,the Internet of Things(IoT)has gradually developed applications such as collecting sensory data and building intelligent services,which has led to an explosion in mobile data traffic.Meanwhile,with the rapid development of artificial intelligence,semantic communication has attracted great attention as a new communication paradigm.However,for IoT devices,however,processing image information efficiently in real time is an essential task for the rapid transmission of semantic information.With the increase of model parameters in deep learning methods,the model inference time in sensor devices continues to increase.In contrast,the Pulse Coupled Neural Network(PCNN)has fewer parameters,making it more suitable for processing real-time scene tasks such as image segmentation,which lays the foundation for real-time,effective,and accurate image transmission.However,the parameters of PCNN are determined by trial and error,which limits its application.To overcome this limitation,an Improved Pulse Coupled Neural Networks(IPCNN)model is proposed in this work.The IPCNN constructs the connection between the static properties of the input image and the dynamic properties of the neurons,and all its parameters are set adaptively,which avoids the inconvenience of manual setting in traditional methods and improves the adaptability of parameters to different types of images.Experimental segmentation results demonstrate the validity and efficiency of the proposed self-adaptive parameter setting method of IPCNN on the gray images and natural images from the Matlab and Berkeley Segmentation Datasets.The IPCNN method achieves a better segmentation result without training,providing a new solution for the real-time transmission of image semantic information.展开更多
Lower back pain is one of the most common medical problems in the world and it is experienced by a huge percentage of people everywhere.Due to its ability to produce a detailed view of the soft tissues,including the s...Lower back pain is one of the most common medical problems in the world and it is experienced by a huge percentage of people everywhere.Due to its ability to produce a detailed view of the soft tissues,including the spinal cord,nerves,intervertebral discs,and vertebrae,Magnetic Resonance Imaging is thought to be the most effective method for imaging the spine.The semantic segmentation of vertebrae plays a major role in the diagnostic process of lumbar diseases.It is difficult to semantically partition the vertebrae in Magnetic Resonance Images from the surrounding variety of tissues,including muscles,ligaments,and intervertebral discs.U-Net is a powerful deep-learning architecture to handle the challenges of medical image analysis tasks and achieves high segmentation accuracy.This work proposes a modified U-Net architecture namely MU-Net,consisting of the Meijering convolutional layer that incorporates the Meijering filter to perform the semantic segmentation of lumbar vertebrae L1 to L5 and sacral vertebra S1.Pseudo-colour mask images were generated and used as ground truth for training the model.The work has been carried out on 1312 images expanded from T1-weighted mid-sagittal MRI images of 515 patients in the Lumbar Spine MRI Dataset publicly available from Mendeley Data.The proposed MU-Net model for the semantic segmentation of the lumbar vertebrae gives better performance with 98.79%of pixel accuracy(PA),98.66%of dice similarity coefficient(DSC),97.36%of Jaccard coefficient,and 92.55%mean Intersection over Union(mean IoU)metrics using the mentioned dataset.展开更多
基金supported in part by collaborative research with Toyota Motor Corporation,in part by ROIS NII Open Collaborative Research under Grant 21S0601,in part by JSPS KAKENHI under Grants 20H00592,21H03424.
文摘With the rapid development of artificial intelligence and the widespread use of the Internet of Things, semantic communication, as an emerging communication paradigm, has been attracting great interest. Taking image transmission as an example, from the semantic communication's view, not all pixels in the images are equally important for certain receivers. The existing semantic communication systems directly perform semantic encoding and decoding on the whole image, in which the region of interest cannot be identified. In this paper, we propose a novel semantic communication system for image transmission that can distinguish between Regions Of Interest (ROI) and Regions Of Non-Interest (RONI) based on semantic segmentation, where a semantic segmentation algorithm is used to classify each pixel of the image and distinguish ROI and RONI. The system also enables high-quality transmission of ROI with lower communication overheads by transmissions through different semantic communication networks with different bandwidth requirements. An improved metric θPSNR is proposed to evaluate the transmission accuracy of the novel semantic transmission network. Experimental results show that our proposed system achieves a significant performance improvement compared with existing approaches, namely, existing semantic communication approaches and the conventional approach without semantics.
基金supported in part by the Tianjin Technology Innovation Guidance Special Fund Project under Grant No.21YDTPJC00850in part by the National Natural Science Foundation of China under Grant No.41906161in part by the Natural Science Foundation of Tianjin under Grant No.21JCQNJC00650。
文摘With the development of underwater sonar detection technology,simultaneous localization and mapping(SLAM)approach has attracted much attention in underwater navigation field in recent years.But the weak detection ability of a single vehicle limits the SLAM performance in wide areas.Thereby,cooperative SLAM using multiple vehicles has become an important research direction.The key factor of cooperative SLAM is timely and efficient sonar image transmission among underwater vehicles.However,the limited bandwidth of underwater acoustic channels contradicts a large amount of sonar image data.It is essential to compress the images before transmission.Recently,deep neural networks have great value in image compression by virtue of the powerful learning ability of neural networks,but the existing sonar image compression methods based on neural network usually focus on the pixel-level information without the semantic-level information.In this paper,we propose a novel underwater acoustic transmission scheme called UAT-SSIC that includes semantic segmentation-based sonar image compression(SSIC)framework and the joint source-channel codec,to improve the accuracy of the semantic information of the reconstructed sonar image at the receiver.The SSIC framework consists of Auto-Encoder structure-based sonar image compression network,which is measured by a semantic segmentation network's residual.Considering that sonar images have the characteristics of blurred target edges,the semantic segmentation network used a special dilated convolution neural network(DiCNN)to enhance segmentation accuracy by expanding the range of receptive fields.The joint source-channel codec with unequal error protection is proposed that adjusts the power level of the transmitted data,which deal with sonar image transmission error caused by the serious underwater acoustic channel.Experiment results demonstrate that our method preserves more semantic information,with advantages over existing methods at the same compression ratio.It also improves the error tolerance and packet loss resistance of transmission.
基金This work is supported in part by The National Natural Science Foundation of China(Grant Number 61971078),which provided domain expertise and computational power that greatly assisted the activityThis work was financially supported by Chongqing Municipal Education Commission Grants for-Major Science and Technology Project(Grant Number gzlcx20243175).
文摘Semantic segmentation of driving scene images is crucial for autonomous driving.While deep learning technology has significantly improved daytime image semantic segmentation,nighttime images pose challenges due to factors like poor lighting and overexposure,making it difficult to recognize small objects.To address this,we propose an Image Adaptive Enhancement(IAEN)module comprising a parameter predictor(Edip),multiple image processing filters(Mdif),and a Detail Processing Module(DPM).Edip combines image processing filters to predict parameters like exposure and hue,optimizing image quality.We adopt a novel image encoder to enhance parameter prediction accuracy by enabling Edip to handle features at different scales.DPM strengthens overlooked image details,extending the IAEN module’s functionality.After the segmentation network,we integrate a Depth Guided Filter(DGF)to refine segmentation outputs.The entire network is trained end-to-end,with segmentation results guiding parameter prediction optimization,promoting self-learning and network improvement.This lightweight and efficient network architecture is particularly suitable for addressing challenges in nighttime image segmentation.Extensive experiments validate significant performance improvements of our approach on the ACDC-night and Nightcity datasets.
文摘Computed Tomography(CT)is a commonly used technology in Printed Circuit Boards(PCB)non-destructive testing,and element segmentation of CT images is a key subsequent step.With the development of deep learning,researchers began to exploit the“pre-training and fine-tuning”training process for multi-element segmentation,reducing the time spent on manual annotation.However,the existing element segmentation model only focuses on the overall accuracy at the pixel level,ignoring whether the element connectivity relationship can be correctly identified.To this end,this paper proposes a PCB CT image element segmentation model optimizing the semantic perception of connectivity relationship(OSPC-seg).The overall training process adopts a“pre-training and fine-tuning”training process.A loss function that optimizes the semantic perception of circuit connectivity relationship(OSPC Loss)is designed from the aspect of alleviating the class imbalance problem and improving the correct connectivity rate.Also,the correct connectivity rate index(CCR)is proposed to evaluate the model’s connectivity relationship recognition capabilities.Experiments show that mIoU and CCR of OSPC-seg on our datasets are 90.1%and 97.0%,improved by 1.5%and 1.6%respectively compared with the baseline model.From visualization results,it can be seen that the segmentation performance of connection positions is significantly improved,which also demonstrates the effectiveness of OSPC-seg.
基金supported in part by the National Key Research and Development Program of China(2018YFB1305002)the National Natural Science Foundation of China(62006256)+2 种基金the Postdoctoral Science Foundation of China(2020M683050)the Key Research and Development Program of Guangzhou(202007050002)the Fundamental Research Funds for the Central Universities(67000-31610134)。
文摘Because pixel values of foggy images are irregularly higher than those of images captured in normal weather(clear images),it is difficult to extract and express their texture.No method has previously been developed to directly explore the relationship between foggy images and semantic segmentation images.We investigated this relationship and propose a generative adversarial network(GAN)for foggy image semantic segmentation(FISS GAN),which contains two parts:an edge GAN and a semantic segmentation GAN.The edge GAN is designed to generate edge information from foggy images to provide auxiliary information to the semantic segmentation GAN.The semantic segmentation GAN is designed to extract and express the texture of foggy images and generate semantic segmentation images.Experiments on foggy cityscapes datasets and foggy driving datasets indicated that FISS GAN achieved state-of-the-art performance.
基金Project(61806107)supported by the National Natural Science Foundation of ChinaProject supported by the Shandong Key Laboratory of Wisdom Mine Information Technology,ChinaProject supported by the Opening Project of State Key Laboratory of Digital Publishing Technology,China。
文摘Semantic segmentation is a crucial step for document understanding.In this paper,an NVIDIA Jetson Nano-based platform is applied for implementing semantic segmentation for teaching artificial intelligence concepts and programming.To extract semantic structures from document images,we present an end-to-end dilated convolution network architecture.Dilated convolutions have well-known advantages for extracting multi-scale context information without losing spatial resolution.Our model utilizes dilated convolutions with residual network to represent the image features and predicting pixel labels.The convolution part works as feature extractor to obtain multidimensional and hierarchical image features.The consecutive deconvolution is used for producing full resolution segmentation prediction.The probability of each pixel decides its predefined semantic class label.To understand segmentation granularity,we compare performances at three different levels.From fine grained class to coarse class levels,the proposed dilated convolution network architecture is evaluated on three document datasets.The experimental results have shown that both semantic data distribution imbalance and network depth are import factors that influence the document’s semantic segmentation performances.The research is aimed at offering an education resource for teaching artificial intelligence concepts and techniques.
基金Supported by Guangdong Natural Science Foundation(No.011628)
文摘This letter presents an efficient and simple image segmentation method for semantic object spatial segmentation. First, the image is filtered using contour-preserving filters. Then it is quasi-flat labeled. The small regions near the contour are classified as uncertain regions and are eliminated by region growing and merging. Further region merging is used to reduce the region number. The simulation results show its efficiency and simplicity. It can preserve the semantic object shape while emphasize on the perceptual complex part of the object. So it conforms to the human visual perception very well.
基金National Natural Science Foundation of China(No.61806006)Jiangsu University Superior Discipline Construction Project。
文摘In view of the problems of multi-scale changes of segmentation targets,noise interference,rough segmentation results and slow training process faced by medical image semantic segmentation,a multi-scale residual aggregation U-shaped attention network structure of MAAUNet(MultiRes aggregation attention UNet)is proposed based on MultiResUNet.Firstly,aggregate connection is introduced from the original feature aggregation at the same level.Skip connection is redesigned to aggregate features of different semantic scales at the decoder subnet,and the problem of semantic gaps is further solved that may exist between skip connections.Secondly,after the multi-scale convolution module,a convolution block attention module is added to focus and integrate features in the two attention directions of channel and space to adaptively optimize the intermediate feature map.Finally,the original convolution block is improved.The convolution channels are expanded with a series convolution structure to complement each other and extract richer spatial features.Residual connections are retained and the convolution block is turned into a multi-channel convolution block.The model is made to extract multi-scale spatial features.The experimental results show that MAAUNet has strong competitiveness in challenging datasets,and shows good segmentation performance and stability in dealing with multi-scale input and noise interference.
基金National Key Research and Development Program of China(No.2017YFC0405806)。
文摘Currently,deep convolutional neural networks have made great progress in the field of semantic segmentation.Because of the fixed convolution kernel geometry,standard convolution neural networks have been limited the ability to simulate geometric transformations.Therefore,a deformable convolution is introduced to enhance the adaptability of convolutional networks to spatial transformation.Considering that the deep convolutional neural networks cannot adequately segment the local objects at the output layer due to using the pooling layers in neural network architecture.To overcome this shortcoming,the rough prediction segmentation results of the neural network output layer will be processed by fully connected conditional random fields to improve the ability of image segmentation.The proposed method can easily be trained by end-to-end using standard backpropagation algorithms.Finally,the proposed method is tested on the ISPRS dataset.The results show that the proposed method can effectively overcome the influence of the complex structure of the segmentation object and obtain state-of-the-art accuracy on the ISPRS Vaihingen 2D semantic labeling dataset.
基金funded in part by the Equipment Pre-Research Foundation of China,Grant No.61400010203in part by the Independent Project of the State Key Laboratory of Virtual Reality Technology and Systems.
文摘There are two types of methods for image segmentation.One is traditional image processing methods,which are sensitive to details and boundaries,yet fail to recognize semantic information.The other is deep learning methods,which can locate and identify different objects,but boundary identifications are not accurate enough.Both of them cannot generate entire segmentation information.In order to obtain accurate edge detection and semantic information,an Adaptive Boundary and Semantic Composite Segmentation method(ABSCS)is proposed.This method can precisely semantic segment individual objects in large-size aerial images with limited GPU performances.It includes adaptively dividing and modifying the aerial images with the proposed principles and methods,using the deep learning method to semantic segment and preprocess the small divided pieces,using three traditional methods to segment and preprocess original-size aerial images,adaptively selecting traditional results tomodify the boundaries of individual objects in deep learning results,and combining the results of different objects.Individual object semantic segmentation experiments are conducted by using the AeroScapes dataset,and their results are analyzed qualitatively and quantitatively.The experimental results demonstrate that the proposed method can achieve more promising object boundaries than the original deep learning method.This work also demonstrates the advantages of the proposed method in applications of point cloud semantic segmentation and image inpainting.
基金This work was supported in part by the National Natural Science Foundation of China under Grant 61772561,author J.Q,http://www.nsfc.gov.cn/in part by the Key Research and Development Plan of Hunan Province under Grant 2018NK2012,author J.Q,http://kjt.hunan.gov.cn/+3 种基金in part by the Science Research Projects of Hunan Provincial Education Department under Grant 18A174,author X.X,http://kxjsc.gov.hnedu.cn/in part by the Degree&Postgraduate Education Reform Project of Hunan Province under Grant 2019JGYB154,author J.Q,http://xwb.gov.hnedu.cn/in part by the Postgraduate Excellent teaching team Project of Hunan Province under Grant[2019]370-133,author J.Q,http://xwb.gov.hnedu.cn/and in part by the Postgraduate Education and Teaching Reform Project of Central South University of Forestry&Technology under Grant 2019JG013,author X.X,http://jwc.csuft.edu.cn/.
文摘To resist the risk of the stego-image being maliciously altered during transmission,we propose a coverless image steganography method based on image segmentation.Most existing coverless steganography methods are based on whole feature mapping,which has poor robustness when facing geometric attacks,because the contents in the image are easy to lost.To solve this problem,we use ResNet to extract semantic features,and segment the object areas from the image through Mask RCNN for information hiding.These selected object areas have ethical structural integrity and are not located in the visual center of the image,reducing the information loss of malicious attacks.Then,these object areas will be binarized to generate hash sequences for information mapping.In transmission,only a set of stego-images unrelated to the secret information are transmitted,so it can fundamentally resist steganalysis.At the same time,since both Mask RCNN and ResNet have excellent robustness,pre-training the model through supervised learning can achieve good performance.The robust hash algorithm can also resist attacks during transmission.Although image segmentation will reduce the capacity,multiple object areas can be extracted from an image to ensure the capacity to a certain extent.Experimental results show that compared with other coverless image steganography methods,our method is more robust when facing geometric attacks.
基金Young Innovative Talents Project of Guangdong Ordinary Universities(No.2022KQNCX225)School-level Teaching and Research Project of Guangzhou City Polytechnic(No.2022xky046)。
文摘The semantic segmentation methods based on CNN have made great progress,but there are still some shortcomings in the application of remote sensing images segmentation,such as the small receptive field can not effectively capture global context.In order to solve this problem,this paper proposes a hybrid model based on ResNet50 and swin transformer to directly capture long-range dependence,which fuses features through Cross Feature Modulation Module(CFMM).Experimental results on two publicly available datasets,Vaihingen and Potsdam,are mIoU of 70.27%and 76.63%,respectively.Thus,CFM-UNet can maintain a high segmentation performance compared with other competitive networks.
基金Supported by the National Program on Key Basic Research Project(No.2013CB329502)the National Natural Science Foundation of China(No.61202212)+1 种基金the Special Research Project of the Educational Department of Shaanxi Province of China(No.15JK1038)the Key Research Project of Baoji University of Arts and Sciences(No.ZK16047)
文摘In recent years,multimedia annotation problem has been attracting significant research attention in multimedia and computer vision areas,especially for automatic image annotation,whose purpose is to provide an efficient and effective searching environment for users to query their images more easily. In this paper,a semi-supervised learning based probabilistic latent semantic analysis( PLSA) model for automatic image annotation is presenred. Since it's often hard to obtain or create labeled images in large quantities while unlabeled ones are easier to collect,a transductive support vector machine( TSVM) is exploited to enhance the quality of the training image data. Then,different image features with different magnitudes will result in different performance for automatic image annotation. To this end,a Gaussian normalization method is utilized to normalize different features extracted from effective image regions segmented by the normalized cuts algorithm so as to reserve the intrinsic content of images as complete as possible. Finally,a PLSA model with asymmetric modalities is constructed based on the expectation maximization( EM) algorithm to predict a candidate set of annotations with confidence scores. Extensive experiments on the general-purpose Corel5k dataset demonstrate that the proposed model can significantly improve performance of traditional PLSA for the task of automatic image annotation.
基金supported by the National Natural Science Foundation under Grants No.62271127,No.61872405,and No.81171411Natural Science Foundation of Sichuan Province,China under Grant No.23NSFSC0627Medico-Engineering Cooperation Funds from University of Electronic Science and Technology of China and West China Hospital of Sichuan University under Grants No.ZYGX2022YGRH011 and No.HXDZ22005.
文摘Automatic segmentation of early esophagus cancer(EEC)in gastrointestinal endoscopy(GIE)images is a critical and challenging task in clinical settings,which relies primarily on labor-intensive and time-consuming routines.EEC has often been diagnosed at the late stage since early signs of cancer are not obvious,resulting in low survival rates.This work proposes a deep learning approach based on the U-Net++method to segment EEC in GIE images.A total of 2690 GIE images collected from 617 patients at the Digestive Endoscopy Center,West China Hospital of Sichuan University,China,have been utilized.The experimental result shows that our proposed method achieved promising results.Furthermore,the comparison has been made between the proposed and other U-Net-related methods using the same dataset.The mean and standard deviation(SD)of the dice similarity coefficient(DSC),intersection over union(IoU),precision(Pre),and recall(Rec)achieved by the proposed framework were DSC(%)=94.62±0.02,IoU(%)=90.99±0.04,Pre(%)=94.61±0.04,and Rec(%)=95.00±0.02,respectively,outperforming the others.The proposed method has the potential to be applied in EEC automatic diagnoses.
基金This work was supported in part by the National Natural Science Foundation of China under Grant U20A20225,61833013in part by Shaanxi Provincial Key Research and Development Program under Grant 2022-GY111.
文摘This paper proposes an improved high-precision 3D semantic mapping method for indoor scenes using RGB-D images.The current semantic mapping algorithms suffer from low semantic annotation accuracy and insufficient real-time performance.To address these issues,we first adopt the Elastic Fusion algorithm to select key frames from indoor environment image sequences captured by the Kinect sensor and construct the indoor environment space model.Then,an indoor RGB-D image semantic segmentation network is proposed,which uses multi-scale feature fusion to quickly and accurately obtain object labeling information at the pixel level of the spatial point cloud model.Finally,Bayesian updating is used to conduct incremental semantic label fusion on the established spatial point cloud model.We also employ dense conditional random fields(CRF)to optimize the 3D semantic map model,resulting in a high-precision spatial semantic map of indoor scenes.Experimental results show that the proposed semantic mapping system can process image sequences collected by RGB-D sensors in real-time and output accurate semantic segmentation results of indoor scene images and the current local spatial semantic map.Finally,it constructs a globally consistent high-precision indoor scenes 3D semantic map.
基金This research was supported by National Key Research and Development program[2018YFF0213606-03(Mu,Y.,Hu,T.L.,Gong,H.,Li,S.J.and Sun,Y.H.)http://www.most.gov.cn]the Jilin Province Science and Technology Development Plan focusing on research and development projects[20200402006NC(Mu,Y.,Hu,T.L.,Gong,H.and Li,S.J.)http://kjt.jl.gov.cn]+1 种基金the science and technology support project for key industries in southern Xinjiang[2018DB001(Gong,H.,and Li,S.J.)http://kjj.xjbt.gov.cn]the key technology R&D project of Changchun Science and Technology Bureau of Jilin Province[21ZGN29(Mu,Y.,Bao,H.P.,Wang X.B.)http://kjj.changchun.gov.cn].
文摘In order to accurately segment architectural features in highresolution remote sensing images,a semantic segmentation method based on U-net network multi-task learning is proposed.First,a boundary distance map was generated based on the remote sensing image of the ground truth map of the building.The remote sensing image and its truth map were used as the input in the U-net network,followed by the addition of the building ground prediction layer at the end of the U-net network.Based on the ResNet network,a multi-task network with the boundary distance prediction layer was built.Experiments involving the ISPRS aerial remote sensing image building and feature annotation data set show that compared with the full convolutional network combined with the multi-layer perceptron method,the intersection ratio of VGG16 network,VGG16+boundary prediction,ResNet50 and the method in this paper were increased by 5.15%,6.946%,6.41%and 7.86%.The accuracy of the networks was increased to 94.71%,95.39%,95.30%and 96.10%respectively,which resulted in high-precision extraction of building features.
基金the National Natural Science Foundation of China(42001408,61806097).
文摘Significant advancements have been achieved in road surface extraction based on high-resolution remote sensingimage processing. Most current methods rely on fully supervised learning, which necessitates enormous humaneffort to label the image. Within this field, other research endeavors utilize weakly supervised methods. Theseapproaches aim to reduce the expenses associated with annotation by leveraging sparsely annotated data, such asscribbles. This paper presents a novel technique called a weakly supervised network using scribble-supervised andedge-mask (WSSE-net). This network is a three-branch network architecture, whereby each branch is equippedwith a distinct decoder module dedicated to road extraction tasks. One of the branches is dedicated to generatingedge masks using edge detection algorithms and optimizing road edge details. The other two branches supervise themodel’s training by employing scribble labels and spreading scribble information throughout the image. To addressthe historical flaw that created pseudo-labels that are not updated with network training, we use mixup to blendprediction results dynamically and continually update new pseudo-labels to steer network training. Our solutiondemonstrates efficient operation by simultaneously considering both edge-mask aid and dynamic pseudo-labelsupport. The studies are conducted on three separate road datasets, which consist primarily of high-resolutionremote-sensing satellite photos and drone images. The experimental findings suggest that our methodologyperforms better than advanced scribble-supervised approaches and specific traditional fully supervised methods.
基金This work was supported by the National Natural Science Foundation of China(Grant No.U20A20197).
文摘We propose a novel image segmentation algorithm to tackle the challenge of limited recognition and segmentation performance in identifying welding seam images during robotic intelligent operations.Initially,to enhance the capability of deep neural networks in extracting geometric attributes from depth images,we developed a novel deep geometric convolution operator(DGConv).DGConv is utilized to construct a deep local geometric feature extraction module,facilitating a more comprehensive exploration of the intrinsic geometric information within depth images.Secondly,we integrate the newly proposed deep geometric feature module with the Fully Convolutional Network(FCN8)to establish a high-performance deep neural network algorithm tailored for depth image segmentation.Concurrently,we enhance the FCN8 detection head by separating the segmentation and classification processes.This enhancement significantly boosts the network’s overall detection capability.Thirdly,for a comprehensive assessment of our proposed algorithm and its applicability in real-world industrial settings,we curated a line-scan image dataset featuring weld seams.This dataset,named the Standardized Linear Depth Profile(SLDP)dataset,was collected from actual industrial sites where autonomous robots are in operation.Ultimately,we conducted experiments utilizing the SLDP dataset,achieving an average accuracy of 92.7%.Our proposed approach exhibited a remarkable performance improvement over the prior method on the identical dataset.Moreover,we have successfully deployed the proposed algorithm in genuine industrial environments,fulfilling the prerequisites of unmanned robot operations.
基金supported in part by the National Key Research and Development Program of China(Grant No.2019YFA0706200).
文摘In recent years,the Internet of Things(IoT)has gradually developed applications such as collecting sensory data and building intelligent services,which has led to an explosion in mobile data traffic.Meanwhile,with the rapid development of artificial intelligence,semantic communication has attracted great attention as a new communication paradigm.However,for IoT devices,however,processing image information efficiently in real time is an essential task for the rapid transmission of semantic information.With the increase of model parameters in deep learning methods,the model inference time in sensor devices continues to increase.In contrast,the Pulse Coupled Neural Network(PCNN)has fewer parameters,making it more suitable for processing real-time scene tasks such as image segmentation,which lays the foundation for real-time,effective,and accurate image transmission.However,the parameters of PCNN are determined by trial and error,which limits its application.To overcome this limitation,an Improved Pulse Coupled Neural Networks(IPCNN)model is proposed in this work.The IPCNN constructs the connection between the static properties of the input image and the dynamic properties of the neurons,and all its parameters are set adaptively,which avoids the inconvenience of manual setting in traditional methods and improves the adaptability of parameters to different types of images.Experimental segmentation results demonstrate the validity and efficiency of the proposed self-adaptive parameter setting method of IPCNN on the gray images and natural images from the Matlab and Berkeley Segmentation Datasets.The IPCNN method achieves a better segmentation result without training,providing a new solution for the real-time transmission of image semantic information.
文摘Lower back pain is one of the most common medical problems in the world and it is experienced by a huge percentage of people everywhere.Due to its ability to produce a detailed view of the soft tissues,including the spinal cord,nerves,intervertebral discs,and vertebrae,Magnetic Resonance Imaging is thought to be the most effective method for imaging the spine.The semantic segmentation of vertebrae plays a major role in the diagnostic process of lumbar diseases.It is difficult to semantically partition the vertebrae in Magnetic Resonance Images from the surrounding variety of tissues,including muscles,ligaments,and intervertebral discs.U-Net is a powerful deep-learning architecture to handle the challenges of medical image analysis tasks and achieves high segmentation accuracy.This work proposes a modified U-Net architecture namely MU-Net,consisting of the Meijering convolutional layer that incorporates the Meijering filter to perform the semantic segmentation of lumbar vertebrae L1 to L5 and sacral vertebra S1.Pseudo-colour mask images were generated and used as ground truth for training the model.The work has been carried out on 1312 images expanded from T1-weighted mid-sagittal MRI images of 515 patients in the Lumbar Spine MRI Dataset publicly available from Mendeley Data.The proposed MU-Net model for the semantic segmentation of the lumbar vertebrae gives better performance with 98.79%of pixel accuracy(PA),98.66%of dice similarity coefficient(DSC),97.36%of Jaccard coefficient,and 92.55%mean Intersection over Union(mean IoU)metrics using the mentioned dataset.