With the rapid development of artificial intelligence and the widespread use of the Internet of Things, semantic communication, as an emerging communication paradigm, has been attracting great interest. Taking image t...With the rapid development of artificial intelligence and the widespread use of the Internet of Things, semantic communication, as an emerging communication paradigm, has been attracting great interest. Taking image transmission as an example, from the semantic communication's view, not all pixels in the images are equally important for certain receivers. The existing semantic communication systems directly perform semantic encoding and decoding on the whole image, in which the region of interest cannot be identified. In this paper, we propose a novel semantic communication system for image transmission that can distinguish between Regions Of Interest (ROI) and Regions Of Non-Interest (RONI) based on semantic segmentation, where a semantic segmentation algorithm is used to classify each pixel of the image and distinguish ROI and RONI. The system also enables high-quality transmission of ROI with lower communication overheads by transmissions through different semantic communication networks with different bandwidth requirements. An improved metric θPSNR is proposed to evaluate the transmission accuracy of the novel semantic transmission network. Experimental results show that our proposed system achieves a significant performance improvement compared with existing approaches, namely, existing semantic communication approaches and the conventional approach without semantics.展开更多
In recent years,semantic segmentation on 3D point cloud data has attracted much attention.Unlike 2D images where pixels distribute regularly in the image domain,3D point clouds in non-Euclidean space are irregular and...In recent years,semantic segmentation on 3D point cloud data has attracted much attention.Unlike 2D images where pixels distribute regularly in the image domain,3D point clouds in non-Euclidean space are irregular and inherently sparse.Therefore,it is very difficult to extract long-range contexts and effectively aggregate local features for semantic segmentation in 3D point cloud space.Most current methods either focus on local feature aggregation or long-range context dependency,but fail to directly establish a global-local feature extractor to complete the point cloud semantic segmentation tasks.In this paper,we propose a Transformer-based stratified graph convolutional network(SGT-Net),which enlarges the effective receptive field and builds direct long-range dependency.Specifically,we first propose a novel dense-sparse sampling strategy that provides dense local vertices and sparse long-distance vertices for subsequent graph convolutional network(GCN).Secondly,we propose a multi-key self-attention mechanism based on the Transformer to further weight augmentation for crucial neighboring relationships and enlarge the effective receptive field.In addition,to further improve the efficiency of the network,we propose a similarity measurement module to determine whether the neighborhood near the center point is effective.We demonstrate the validity and superiority of our method on the S3DIS and ShapeNet datasets.Through ablation experiments and segmentation visualization,we verify that the SGT model can improve the performance of the point cloud semantic segmentation.展开更多
Automated optical inspection(AOI)is a significant process in printed circuit board assembly(PCBA)production lines which aims to detect tiny defects in PCBAs.Existing AOI equipment has several deficiencies including lo...Automated optical inspection(AOI)is a significant process in printed circuit board assembly(PCBA)production lines which aims to detect tiny defects in PCBAs.Existing AOI equipment has several deficiencies including low throughput,large computation cost,high latency,and poor flexibility,which limits the efficiency of online PCBA inspection.In this paper,a novel PCBA defect detection method based on a lightweight deep convolution neural network is proposed.In this method,the semantic segmentation model is combined with a rule-based defect recognition algorithm to build up a defect detection frame-work.To improve the performance of the model,extensive real PCBA images are collected from production lines as datasets.Some optimization methods have been applied in the model according to production demand and enable integration in lightweight computing devices.Experiment results show that the production line using our method realizes a throughput more than three times higher than traditional methods.Our method can be integrated into a lightweight inference system and pro-mote the flexibility of AOI.The proposed method builds up a general paradigm and excellent example for model design and optimization oriented towards industrial requirements.展开更多
Semantic segmentation of driving scene images is crucial for autonomous driving.While deep learning technology has significantly improved daytime image semantic segmentation,nighttime images pose challenges due to fac...Semantic segmentation of driving scene images is crucial for autonomous driving.While deep learning technology has significantly improved daytime image semantic segmentation,nighttime images pose challenges due to factors like poor lighting and overexposure,making it difficult to recognize small objects.To address this,we propose an Image Adaptive Enhancement(IAEN)module comprising a parameter predictor(Edip),multiple image processing filters(Mdif),and a Detail Processing Module(DPM).Edip combines image processing filters to predict parameters like exposure and hue,optimizing image quality.We adopt a novel image encoder to enhance parameter prediction accuracy by enabling Edip to handle features at different scales.DPM strengthens overlooked image details,extending the IAEN module’s functionality.After the segmentation network,we integrate a Depth Guided Filter(DGF)to refine segmentation outputs.The entire network is trained end-to-end,with segmentation results guiding parameter prediction optimization,promoting self-learning and network improvement.This lightweight and efficient network architecture is particularly suitable for addressing challenges in nighttime image segmentation.Extensive experiments validate significant performance improvements of our approach on the ACDC-night and Nightcity datasets.展开更多
Intelligent vehicle tracking and detection are crucial tasks in the realm of highway management.However,vehicles come in a range of sizes,which is challenging to detect,affecting the traffic monitoring system’s overa...Intelligent vehicle tracking and detection are crucial tasks in the realm of highway management.However,vehicles come in a range of sizes,which is challenging to detect,affecting the traffic monitoring system’s overall accuracy.Deep learning is considered to be an efficient method for object detection in vision-based systems.In this paper,we proposed a vision-based vehicle detection and tracking system based on a You Look Only Once version 5(YOLOv5)detector combined with a segmentation technique.The model consists of six steps.In the first step,all the extracted traffic sequence images are subjected to pre-processing to remove noise and enhance the contrast level of the images.These pre-processed images are segmented by labelling each pixel to extract the uniform regions to aid the detection phase.A single-stage detector YOLOv5 is used to detect and locate vehicles in images.Each detection was exposed to Speeded Up Robust Feature(SURF)feature extraction to track multiple vehicles.Based on this,a unique number is assigned to each vehicle to easily locate them in the succeeding image frames by extracting them using the feature-matching technique.Further,we implemented a Kalman filter to track multiple vehicles.In the end,the vehicle path is estimated by using the centroid points of the rectangular bounding box predicted by the tracking algorithm.The experimental results and comparison reveal that our proposed vehicle detection and tracking system outperformed other state-of-the-art systems.The proposed implemented system provided 94.1%detection precision for Roundabout and 96.1%detection precision for Vehicle Aerial Imaging from Drone(VAID)datasets,respectively.展开更多
The detection of crack defects on the walls of road tunnels is a crucial step in the process of ensuring travel safetyand performing routine tunnel maintenance. The automatic and accurate detection of cracks on the su...The detection of crack defects on the walls of road tunnels is a crucial step in the process of ensuring travel safetyand performing routine tunnel maintenance. The automatic and accurate detection of cracks on the surface of roadtunnels is the key to improving the maintenance efficiency of road tunnels. Machine vision technology combinedwith a deep neural network model is an effective means to realize the localization and identification of crackdefects on the surface of road tunnels.We propose a complete set of automatic inspection methods for identifyingcracks on the walls of road tunnels as a solution to the problem of difficulty in identifying cracks during manualmaintenance. First, a set of equipment applied to the real-time acquisition of high-definition images of walls inroad tunnels is designed. Images of walls in road tunnels are acquired based on the designed equipment, whereimages containing crack defects are manually identified and selected. Subsequently, the training and validationsets used to construct the crack inspection model are obtained based on the acquired images, whereas the regionscontaining cracks and the pixels of the cracks are finely labeled. After that, a crack area sensing module is designedbased on the proposed you only look once version 7 model combined with coordinate attention mechanism (CAYOLOV7) network to locate the crack regions in the road tunnel surface images. Only subimages containingcracks are acquired and sent to the multiscale semantic segmentation module for extraction of the pixels to whichthe cracks belong based on the DeepLab V3+ network. The precision and recall of the crack region localizationon the surface of a road tunnel based on our proposed method are 82.4% and 93.8%, respectively. Moreover, themean intersection over union (MIoU) and pixel accuracy (PA) values for achieving pixel-level detection accuracyare 76.84% and 78.29%, respectively. The experimental results on the dataset show that our proposed two-stagedetection method outperforms other state-of-the-art models in crack region localization and detection. Based onour proposedmethod, the images captured on the surface of a road tunnel can complete crack detection at a speed often frames/second, and the detection accuracy can reach 0.25 mm, which meets the requirements for maintenanceof an actual project. The designed CA-YOLO V7 network enables precise localization of the area to which a crackbelongs in images acquired under different environmental and lighting conditions in road tunnels. The improvedDeepLab V3+ network based on lightweighting is able to extract crack morphology in a given region more quicklywhile maintaining segmentation accuracy. The established model combines defect localization and segmentationmodels for the first time, realizing pixel-level defect localization and extraction on the surface of road tunnelsin complex environments, and is capable of determining the actual size of cracks based on the physical coordinatesystemafter camera calibration. The trainedmodelhas highaccuracy andcanbe extendedandapplied to embeddedcomputing devices for the assessment and repair of damaged areas in different types of road tunnels.展开更多
This paper focuses on the task of few-shot 3D point cloud semantic segmentation.Despite some progress,this task still encounters many issues due to the insufficient samples given,e.g.,incomplete object segmentation an...This paper focuses on the task of few-shot 3D point cloud semantic segmentation.Despite some progress,this task still encounters many issues due to the insufficient samples given,e.g.,incomplete object segmentation and inaccurate semantic discrimination.To tackle these issues,we first leverage part-whole relationships into the task of 3D point cloud semantic segmentation to capture semantic integrity,which is empowered by the dynamic capsule routing with the module of 3D Capsule Networks(CapsNets)in the embedding network.Concretely,the dynamic routing amalgamates geometric information of the 3D point cloud data to construct higher-level feature representations,which capture the relationships between object parts and their wholes.Secondly,we designed a multi-prototype enhancement module to enhance the prototype discriminability.Specifically,the single-prototype enhancement mechanism is expanded to the multi-prototype enhancement version for capturing rich semantics.Besides,the shot-correlation within the category is calculated via the interaction of different samples to enhance the intra-category similarity.Ablation studies prove that the involved part-whole relations and proposed multi-prototype enhancement module help to achieve complete object segmentation and improve semantic discrimination.Moreover,under the integration of these two modules,quantitative and qualitative experiments on two public benchmarks,including S3DIS and ScanNet,indicate the superior performance of the proposed framework on the task of 3D point cloud semantic segmentation,compared to some state-of-the-art methods.展开更多
High-resolution remote sensing image segmentation is a challenging task. In urban remote sensing, the presenceof occlusions and shadows often results in blurred or invisible object boundaries, thereby increasing the d...High-resolution remote sensing image segmentation is a challenging task. In urban remote sensing, the presenceof occlusions and shadows often results in blurred or invisible object boundaries, thereby increasing the difficultyof segmentation. In this paper, an improved network with a cross-region self-attention mechanism for multi-scalefeatures based onDeepLabv3+is designed to address the difficulties of small object segmentation and blurred targetedge segmentation. First,we use CrossFormer as the backbone feature extraction network to achieve the interactionbetween large- and small-scale features, and establish self-attention associations between features at both large andsmall scales to capture global contextual feature information. Next, an improved atrous spatial pyramid poolingmodule is introduced to establish multi-scale feature maps with large- and small-scale feature associations, andattention vectors are added in the channel direction to enable adaptive adjustment of multi-scale channel features.The proposed networkmodel is validated using the PotsdamandVaihingen datasets. The experimental results showthat, compared with existing techniques, the network model designed in this paper can extract and fuse multiscaleinformation, more clearly extract edge information and small-scale information, and segment boundariesmore smoothly. Experimental results on public datasets demonstrate the superiority of ourmethod compared withseveral state-of-the-art networks.展开更多
This paper focuses on the effective utilization of data augmentation techniques for 3Dlidar point clouds to enhance the performance of neural network models.These point clouds,which represent spatial information throu...This paper focuses on the effective utilization of data augmentation techniques for 3Dlidar point clouds to enhance the performance of neural network models.These point clouds,which represent spatial information through a collection of 3D coordinates,have found wide-ranging applications.Data augmentation has emerged as a potent solution to the challenges posed by limited labeled data and the need to enhance model generalization capabilities.Much of the existing research is devoted to crafting novel data augmentation methods specifically for 3D lidar point clouds.However,there has been a lack of focus on making the most of the numerous existing augmentation techniques.Addressing this deficiency,this research investigates the possibility of combining two fundamental data augmentation strategies.The paper introduces PolarMix andMix3D,two commonly employed augmentation techniques,and presents a new approach,named RandomFusion.Instead of using a fixed or predetermined combination of augmentation methods,RandomFusion randomly chooses one method from a pool of options for each instance or sample.This innovative data augmentation technique randomly augments each point in the point cloud with either PolarMix or Mix3D.The crux of this strategy is the random choice between PolarMix and Mix3Dfor the augmentation of each point within the point cloud data set.The results of the experiments conducted validate the efficacy of the RandomFusion strategy in enhancing the performance of neural network models for 3D lidar point cloud semantic segmentation tasks.This is achieved without compromising computational efficiency.By examining the potential of merging different augmentation techniques,the research contributes significantly to a more comprehensive understanding of how to utilize existing augmentation methods for 3D lidar point clouds.RandomFusion data augmentation technique offers a simple yet effective method to leverage the diversity of augmentation techniques and boost the robustness of models.The insights gained from this research can pave the way for future work aimed at developing more advanced and efficient data augmentation strategies for 3D lidar point cloud analysis.展开更多
In cornfields,factors such as the similarity between corn seedlings and weeds and the blurring of plant edge details pose challenges to corn and weed segmentation.In addition,remote areas such as farmland are usually ...In cornfields,factors such as the similarity between corn seedlings and weeds and the blurring of plant edge details pose challenges to corn and weed segmentation.In addition,remote areas such as farmland are usually constrained by limited computational resources and limited collected data.Therefore,it becomes necessary to lighten the model to better adapt to complex cornfield scene,and make full use of the limited data information.In this paper,we propose an improved image segmentation algorithm based on unet.Firstly,the inverted residual structure is introduced into the contraction path to reduce the number of parameters in the training process and improve the feature extraction ability;secondly,the pyramid pooling module is introduced to enhance the network’s ability of acquiring contextual information as well as the ability of dealing with the small target loss problem;and lastly,Finally,to further enhance the segmentation capability of the model,the squeeze and excitation mechanism is introduced in the expansion path.We used images of corn seedlings collected in the field and publicly available corn weed datasets to evaluate the improved model.The improved model has a total parameter of 3.79 M and miou can achieve 87.9%.The fps on a single 3050 ti video card is about 58.9.The experimental results show that the network proposed in this paper can quickly segment corn weeds in a cornfield scenario with good segmentation accuracy.展开更多
Computed Tomography(CT)is a commonly used technology in Printed Circuit Boards(PCB)non-destructive testing,and element segmentation of CT images is a key subsequent step.With the development of deep learning,researche...Computed Tomography(CT)is a commonly used technology in Printed Circuit Boards(PCB)non-destructive testing,and element segmentation of CT images is a key subsequent step.With the development of deep learning,researchers began to exploit the“pre-training and fine-tuning”training process for multi-element segmentation,reducing the time spent on manual annotation.However,the existing element segmentation model only focuses on the overall accuracy at the pixel level,ignoring whether the element connectivity relationship can be correctly identified.To this end,this paper proposes a PCB CT image element segmentation model optimizing the semantic perception of connectivity relationship(OSPC-seg).The overall training process adopts a“pre-training and fine-tuning”training process.A loss function that optimizes the semantic perception of circuit connectivity relationship(OSPC Loss)is designed from the aspect of alleviating the class imbalance problem and improving the correct connectivity rate.Also,the correct connectivity rate index(CCR)is proposed to evaluate the model’s connectivity relationship recognition capabilities.Experiments show that mIoU and CCR of OSPC-seg on our datasets are 90.1%and 97.0%,improved by 1.5%and 1.6%respectively compared with the baseline model.From visualization results,it can be seen that the segmentation performance of connection positions is significantly improved,which also demonstrates the effectiveness of OSPC-seg.展开更多
Breast Arterial Calcification(BAC)is a mammographic decision dissimilar to cancer and commonly observed in elderly women.Thus identifying BAC could provide an expense,and be inaccurate.Recently Deep Learning(DL)method...Breast Arterial Calcification(BAC)is a mammographic decision dissimilar to cancer and commonly observed in elderly women.Thus identifying BAC could provide an expense,and be inaccurate.Recently Deep Learning(DL)methods have been introduced for automatic BAC detection and quantification with increased accuracy.Previously,classification with deep learning had reached higher efficiency,but designing the structure of DL proved to be an extremely challenging task due to overfitting models.It also is not able to capture the patterns and irregularities presented in the images.To solve the overfitting problem,an optimal feature set has been formed by Enhanced Wolf Pack Algorithm(EWPA),and their irregularities are identified by Dense-kUNet segmentation.In this paper,Dense-kUNet for segmentation and optimal feature has been introduced for classification(severe,mild,light)that integrates DenseUNet and kU-Net.Longer bound links exist among adjacent modules,allowing relatively rough data to be sent to the following component and assisting the system in finding higher qualities.The major contribution of the work is to design the best features selected by Enhanced Wolf Pack Algorithm(EWPA),and Modified Support Vector Machine(MSVM)based learning for classification.k-Dense-UNet is introduced which combines the procedure of Dense-UNet and kU-Net for image segmentation.Longer bound associations occur among nearby sections,allowing relatively granular data to be sent to the next subsystem and benefiting the system in recognizing smaller characteristics.The proposed techniques and the performance are tested using several types of analysis techniques 826 filled digitized mammography.The proposed method achieved the highest precision,recall,F-measure,and accuracy of 84.4333%,84.5333%,84.4833%,and 86.8667%when compared to other methods on the Digital Database for Screening Mammography(DDSM).展开更多
With the development of underwater sonar detection technology,simultaneous localization and mapping(SLAM)approach has attracted much attention in underwater navigation field in recent years.But the weak detection abil...With the development of underwater sonar detection technology,simultaneous localization and mapping(SLAM)approach has attracted much attention in underwater navigation field in recent years.But the weak detection ability of a single vehicle limits the SLAM performance in wide areas.Thereby,cooperative SLAM using multiple vehicles has become an important research direction.The key factor of cooperative SLAM is timely and efficient sonar image transmission among underwater vehicles.However,the limited bandwidth of underwater acoustic channels contradicts a large amount of sonar image data.It is essential to compress the images before transmission.Recently,deep neural networks have great value in image compression by virtue of the powerful learning ability of neural networks,but the existing sonar image compression methods based on neural network usually focus on the pixel-level information without the semantic-level information.In this paper,we propose a novel underwater acoustic transmission scheme called UAT-SSIC that includes semantic segmentation-based sonar image compression(SSIC)framework and the joint source-channel codec,to improve the accuracy of the semantic information of the reconstructed sonar image at the receiver.The SSIC framework consists of Auto-Encoder structure-based sonar image compression network,which is measured by a semantic segmentation network's residual.Considering that sonar images have the characteristics of blurred target edges,the semantic segmentation network used a special dilated convolution neural network(DiCNN)to enhance segmentation accuracy by expanding the range of receptive fields.The joint source-channel codec with unequal error protection is proposed that adjusts the power level of the transmitted data,which deal with sonar image transmission error caused by the serious underwater acoustic channel.Experiment results demonstrate that our method preserves more semantic information,with advantages over existing methods at the same compression ratio.It also improves the error tolerance and packet loss resistance of transmission.展开更多
As the field of autonomous driving evolves, real-time semantic segmentation has become a crucial part of computer vision tasks. However, most existing methods use lightweight convolution to reduce the computational ef...As the field of autonomous driving evolves, real-time semantic segmentation has become a crucial part of computer vision tasks. However, most existing methods use lightweight convolution to reduce the computational effort, resulting in lower accuracy. To address this problem, we construct TBANet, a network with an encoder-decoder structure for efficient feature extraction. In the encoder part, the TBA module is designed to extract details and the ETBA module is used to learn semantic representations in a high-dimensional space. In the decoder part, we design a combination of multiple upsampling methods to aggregate features with less computational overhead. We validate the efficiency of TBANet on the Cityscapes dataset. It achieves 75.1% mean Intersection over Union(mIoU) with only 2.07 million parameters and can reach 90.3 Frames Per Second(FPS).展开更多
Because pixel values of foggy images are irregularly higher than those of images captured in normal weather(clear images),it is difficult to extract and express their texture.No method has previously been developed to...Because pixel values of foggy images are irregularly higher than those of images captured in normal weather(clear images),it is difficult to extract and express their texture.No method has previously been developed to directly explore the relationship between foggy images and semantic segmentation images.We investigated this relationship and propose a generative adversarial network(GAN)for foggy image semantic segmentation(FISS GAN),which contains two parts:an edge GAN and a semantic segmentation GAN.The edge GAN is designed to generate edge information from foggy images to provide auxiliary information to the semantic segmentation GAN.The semantic segmentation GAN is designed to extract and express the texture of foggy images and generate semantic segmentation images.Experiments on foggy cityscapes datasets and foggy driving datasets indicated that FISS GAN achieved state-of-the-art performance.展开更多
Semantic segmentation is a crucial step for document understanding.In this paper,an NVIDIA Jetson Nano-based platform is applied for implementing semantic segmentation for teaching artificial intelligence concepts and...Semantic segmentation is a crucial step for document understanding.In this paper,an NVIDIA Jetson Nano-based platform is applied for implementing semantic segmentation for teaching artificial intelligence concepts and programming.To extract semantic structures from document images,we present an end-to-end dilated convolution network architecture.Dilated convolutions have well-known advantages for extracting multi-scale context information without losing spatial resolution.Our model utilizes dilated convolutions with residual network to represent the image features and predicting pixel labels.The convolution part works as feature extractor to obtain multidimensional and hierarchical image features.The consecutive deconvolution is used for producing full resolution segmentation prediction.The probability of each pixel decides its predefined semantic class label.To understand segmentation granularity,we compare performances at three different levels.From fine grained class to coarse class levels,the proposed dilated convolution network architecture is evaluated on three document datasets.The experimental results have shown that both semantic data distribution imbalance and network depth are import factors that influence the document’s semantic segmentation performances.The research is aimed at offering an education resource for teaching artificial intelligence concepts and techniques.展开更多
In the study of automatic driving,understanding the road scene is a key to improve driving safety.The semantic segmentation method could divide the image into different areas associated with semantic categories in acc...In the study of automatic driving,understanding the road scene is a key to improve driving safety.The semantic segmentation method could divide the image into different areas associated with semantic categories in accordance with the pixel level,so as to help vehicles to perceive and obtain the surrounding road environment information,which would improve driving safety.Deeplabv3+is the current popular semantic segmentation model.There are phenomena that small targets are missed and similar objects are easily misjudged during its semantic segmentation tasks,which leads to rough segmentation boundary and reduces semantic accuracy.This study focuses on the issue,based on the Deeplabv3+network structure and combined with the attention mechanism,to increase the weight of the segmentation area,and then proposes an improved Deeplabv3+fusion attention mechanism for road scene semantic segmentation method.First,a group of parallel position attention module and channel attention module are introduced on the Deeplabv3+encoding end to capture more spatial context information and high-level semantic information.Then,an attention mechanism is introduced to restore the spatial detail information,and the data shall be normalized in order to accelerate the convergence speed of the model at the decoding end.The effects of model segmentation with different attention-introducing mechanisms are compared and tested on CamVid and Cityscapes datasets.The experimental results show that the mean Intersection over Unons of the improved model segmentation accuracies on the two datasets are boosted by 6.88%and 2.58%,respectively,which is better than using Deeplabv3+.This method does not significantly increase the amount of network calculation and complexity,and has a good balance of speed and accuracy.展开更多
Multiple ocular region segmentation plays an important role in different applications such as biometrics,liveness detection,healthcare,and gaze estimation.Typically,segmentation techniques focus on a single region of ...Multiple ocular region segmentation plays an important role in different applications such as biometrics,liveness detection,healthcare,and gaze estimation.Typically,segmentation techniques focus on a single region of the eye at a time.Despite the number of obvious advantages,very limited research has focused on multiple regions of the eye.Similarly,accurate segmentation of multiple eye regions is necessary in challenging scenarios involving blur,ghost effects low resolution,off-angles,and unusual glints.Currently,the available segmentation methods cannot address these constraints.In this paper,to address the accurate segmentation of multiple eye regions in unconstrainted scenarios,a lightweight outer residual encoder-decoder network suitable for various sensor images is proposed.The proposed method can determine the true boundaries of the eye regions from inferior-quality images using the high-frequency information flow from the outer residual encoder-decoder deep convolutional neural network(called ORED-Net).Moreover,the proposed ORED-Net model does not improve the performance based on the complexity,number of parameters or network depth.The proposed network is considerably lighter than previous state-of-theart models.Comprehensive experiments were performed,and optimal performance was achieved using SBVPI and UBIRIS.v2 datasets containing images of the eye region.The simulation results obtained using the proposed OREDNet,with the mean intersection over union score(mIoU)of 89.25 and 85.12 on the challenging SBVPI and UBIRIS.v2 datasets,respectively.展开更多
This paper proposes a novel method of lane detection,which adopts VGG16 as the basis of convolutional neural network to extract lane line features by cavity convolution,wherein the lane lines are divided into dotted l...This paper proposes a novel method of lane detection,which adopts VGG16 as the basis of convolutional neural network to extract lane line features by cavity convolution,wherein the lane lines are divided into dotted lines and solid lines.Expanding the field of experience through hollow convolution,the full connection layer of the network is discarded,the last largest pooling layer of the VGG16 network is removed,and the processing of the last three convolution layers is replaced by hole convolution.At the same time,CNN adopts the encoder and decoder structure mode,and uses the index function of the maximum pooling layer in the decoder part to upsample the encoder in a counter-pooling manner,realizing semantic segmentation.And combined with the instance segmentation,and finally through the fitting to achieve the detection of the lane line.In addition,the currently disclosed lane line data sets are relatively small,and there is no distinction between lane solid lines and dashed lines.To this end,our work made a lane line data set for the lane virtual and real identification,and based on the proposed algorithm effective verification of the data set achieved by the increased segmentation.The final test shows that the proposed method has a good balance between lane detection speed and accuracy,which has good robustness.展开更多
Buildings undergo various kinds of structural damage during earthquakes,and damage detection and functional assessment of these structures in the aftermath of the events have been challenging issues.Under these circum...Buildings undergo various kinds of structural damage during earthquakes,and damage detection and functional assessment of these structures in the aftermath of the events have been challenging issues.Under these circumstances,computer vision techniques offer a promising solution by automating the inspection process.This study presents an effective methodology for automatic structural components and damage detection using unmanned aerial vehicle(UAV)images of damaged buildings.Two types of neural network architectures are considered for appropriate feature extractions in different task detections.The feature pyramid network(FPN)is employed for crack,spall,rebar,and component damage segmentation,while the UNet++network is utilized for the damage state.For network training and validation,a total of 3805 original images of size 1920×1080 pixels are processed by the proposed method and reduced the image pixels.From the FPN,the achieved highest intersection over unions(IoUs)were 0.59,0.93,0.42,and 0.99 for crack,spall,rebar,and components,respectively.These predicted labels were found in close agreement with the labels.Similarly,the UNet++recognized the semantic information and damage state with an IoU of 0.72.This demonstrated the applicability of the proposed method for automated post-earthquake building inspection process accurately without information loss from the original images.展开更多
基金supported in part by collaborative research with Toyota Motor Corporation,in part by ROIS NII Open Collaborative Research under Grant 21S0601,in part by JSPS KAKENHI under Grants 20H00592,21H03424.
文摘With the rapid development of artificial intelligence and the widespread use of the Internet of Things, semantic communication, as an emerging communication paradigm, has been attracting great interest. Taking image transmission as an example, from the semantic communication's view, not all pixels in the images are equally important for certain receivers. The existing semantic communication systems directly perform semantic encoding and decoding on the whole image, in which the region of interest cannot be identified. In this paper, we propose a novel semantic communication system for image transmission that can distinguish between Regions Of Interest (ROI) and Regions Of Non-Interest (RONI) based on semantic segmentation, where a semantic segmentation algorithm is used to classify each pixel of the image and distinguish ROI and RONI. The system also enables high-quality transmission of ROI with lower communication overheads by transmissions through different semantic communication networks with different bandwidth requirements. An improved metric θPSNR is proposed to evaluate the transmission accuracy of the novel semantic transmission network. Experimental results show that our proposed system achieves a significant performance improvement compared with existing approaches, namely, existing semantic communication approaches and the conventional approach without semantics.
基金supported in part by the National Natural Science Foundation of China under Grant Nos.U20A20197,62306187the Foundation of Ministry of Industry and Information Technology TC220H05X-04.
文摘In recent years,semantic segmentation on 3D point cloud data has attracted much attention.Unlike 2D images where pixels distribute regularly in the image domain,3D point clouds in non-Euclidean space are irregular and inherently sparse.Therefore,it is very difficult to extract long-range contexts and effectively aggregate local features for semantic segmentation in 3D point cloud space.Most current methods either focus on local feature aggregation or long-range context dependency,but fail to directly establish a global-local feature extractor to complete the point cloud semantic segmentation tasks.In this paper,we propose a Transformer-based stratified graph convolutional network(SGT-Net),which enlarges the effective receptive field and builds direct long-range dependency.Specifically,we first propose a novel dense-sparse sampling strategy that provides dense local vertices and sparse long-distance vertices for subsequent graph convolutional network(GCN).Secondly,we propose a multi-key self-attention mechanism based on the Transformer to further weight augmentation for crucial neighboring relationships and enlarge the effective receptive field.In addition,to further improve the efficiency of the network,we propose a similarity measurement module to determine whether the neighborhood near the center point is effective.We demonstrate the validity and superiority of our method on the S3DIS and ShapeNet datasets.Through ablation experiments and segmentation visualization,we verify that the SGT model can improve the performance of the point cloud semantic segmentation.
基金supported in part by the IoT Intelligent Microsystem Center of Tsinghua University-China Mobile Joint Research Institute.
文摘Automated optical inspection(AOI)is a significant process in printed circuit board assembly(PCBA)production lines which aims to detect tiny defects in PCBAs.Existing AOI equipment has several deficiencies including low throughput,large computation cost,high latency,and poor flexibility,which limits the efficiency of online PCBA inspection.In this paper,a novel PCBA defect detection method based on a lightweight deep convolution neural network is proposed.In this method,the semantic segmentation model is combined with a rule-based defect recognition algorithm to build up a defect detection frame-work.To improve the performance of the model,extensive real PCBA images are collected from production lines as datasets.Some optimization methods have been applied in the model according to production demand and enable integration in lightweight computing devices.Experiment results show that the production line using our method realizes a throughput more than three times higher than traditional methods.Our method can be integrated into a lightweight inference system and pro-mote the flexibility of AOI.The proposed method builds up a general paradigm and excellent example for model design and optimization oriented towards industrial requirements.
基金This work is supported in part by The National Natural Science Foundation of China(Grant Number 61971078),which provided domain expertise and computational power that greatly assisted the activityThis work was financially supported by Chongqing Municipal Education Commission Grants for-Major Science and Technology Project(Grant Number gzlcx20243175).
文摘Semantic segmentation of driving scene images is crucial for autonomous driving.While deep learning technology has significantly improved daytime image semantic segmentation,nighttime images pose challenges due to factors like poor lighting and overexposure,making it difficult to recognize small objects.To address this,we propose an Image Adaptive Enhancement(IAEN)module comprising a parameter predictor(Edip),multiple image processing filters(Mdif),and a Detail Processing Module(DPM).Edip combines image processing filters to predict parameters like exposure and hue,optimizing image quality.We adopt a novel image encoder to enhance parameter prediction accuracy by enabling Edip to handle features at different scales.DPM strengthens overlooked image details,extending the IAEN module’s functionality.After the segmentation network,we integrate a Depth Guided Filter(DGF)to refine segmentation outputs.The entire network is trained end-to-end,with segmentation results guiding parameter prediction optimization,promoting self-learning and network improvement.This lightweight and efficient network architecture is particularly suitable for addressing challenges in nighttime image segmentation.Extensive experiments validate significant performance improvements of our approach on the ACDC-night and Nightcity datasets.
基金This researchwas supported by the Deanship of ScientificResearch at Najran University,under the Research Group Funding Program Grant Code(NU/RG/SERC/12/30)This research is supported and funded by Princess Nourah bint Abdulrahman University Researchers Supporting Project Number(PNURSP2024R410)+1 种基金Princess Nourah bint Abdulrahman University,Riyadh,Saudi ArabiaThis study is supported via funding from Prince Sattam bin Abdulaziz University Project Number(PSAU/2024/R/1445).
文摘Intelligent vehicle tracking and detection are crucial tasks in the realm of highway management.However,vehicles come in a range of sizes,which is challenging to detect,affecting the traffic monitoring system’s overall accuracy.Deep learning is considered to be an efficient method for object detection in vision-based systems.In this paper,we proposed a vision-based vehicle detection and tracking system based on a You Look Only Once version 5(YOLOv5)detector combined with a segmentation technique.The model consists of six steps.In the first step,all the extracted traffic sequence images are subjected to pre-processing to remove noise and enhance the contrast level of the images.These pre-processed images are segmented by labelling each pixel to extract the uniform regions to aid the detection phase.A single-stage detector YOLOv5 is used to detect and locate vehicles in images.Each detection was exposed to Speeded Up Robust Feature(SURF)feature extraction to track multiple vehicles.Based on this,a unique number is assigned to each vehicle to easily locate them in the succeeding image frames by extracting them using the feature-matching technique.Further,we implemented a Kalman filter to track multiple vehicles.In the end,the vehicle path is estimated by using the centroid points of the rectangular bounding box predicted by the tracking algorithm.The experimental results and comparison reveal that our proposed vehicle detection and tracking system outperformed other state-of-the-art systems.The proposed implemented system provided 94.1%detection precision for Roundabout and 96.1%detection precision for Vehicle Aerial Imaging from Drone(VAID)datasets,respectively.
基金the Changsha Science and Technology Plan 2004081in part by the Science and Technology Program of Hunan Provincial Department of Transportation 202117in part by the Science and Technology Research and Development Program Project of the China Railway Group Limited 2021-Special-08.
文摘The detection of crack defects on the walls of road tunnels is a crucial step in the process of ensuring travel safetyand performing routine tunnel maintenance. The automatic and accurate detection of cracks on the surface of roadtunnels is the key to improving the maintenance efficiency of road tunnels. Machine vision technology combinedwith a deep neural network model is an effective means to realize the localization and identification of crackdefects on the surface of road tunnels.We propose a complete set of automatic inspection methods for identifyingcracks on the walls of road tunnels as a solution to the problem of difficulty in identifying cracks during manualmaintenance. First, a set of equipment applied to the real-time acquisition of high-definition images of walls inroad tunnels is designed. Images of walls in road tunnels are acquired based on the designed equipment, whereimages containing crack defects are manually identified and selected. Subsequently, the training and validationsets used to construct the crack inspection model are obtained based on the acquired images, whereas the regionscontaining cracks and the pixels of the cracks are finely labeled. After that, a crack area sensing module is designedbased on the proposed you only look once version 7 model combined with coordinate attention mechanism (CAYOLOV7) network to locate the crack regions in the road tunnel surface images. Only subimages containingcracks are acquired and sent to the multiscale semantic segmentation module for extraction of the pixels to whichthe cracks belong based on the DeepLab V3+ network. The precision and recall of the crack region localizationon the surface of a road tunnel based on our proposed method are 82.4% and 93.8%, respectively. Moreover, themean intersection over union (MIoU) and pixel accuracy (PA) values for achieving pixel-level detection accuracyare 76.84% and 78.29%, respectively. The experimental results on the dataset show that our proposed two-stagedetection method outperforms other state-of-the-art models in crack region localization and detection. Based onour proposedmethod, the images captured on the surface of a road tunnel can complete crack detection at a speed often frames/second, and the detection accuracy can reach 0.25 mm, which meets the requirements for maintenanceof an actual project. The designed CA-YOLO V7 network enables precise localization of the area to which a crackbelongs in images acquired under different environmental and lighting conditions in road tunnels. The improvedDeepLab V3+ network based on lightweighting is able to extract crack morphology in a given region more quicklywhile maintaining segmentation accuracy. The established model combines defect localization and segmentationmodels for the first time, realizing pixel-level defect localization and extraction on the surface of road tunnelsin complex environments, and is capable of determining the actual size of cracks based on the physical coordinatesystemafter camera calibration. The trainedmodelhas highaccuracy andcanbe extendedandapplied to embeddedcomputing devices for the assessment and repair of damaged areas in different types of road tunnels.
基金This work is supported by the National Natural Science Foundation of China under Grant No.62001341the National Natural Science Foundation of Jiangsu Province under Grant No.BK20221379the Jiangsu Engineering Research Center of Digital Twinning Technology for Key Equipment in Petrochemical Process under Grant No.DTEC202104.
文摘This paper focuses on the task of few-shot 3D point cloud semantic segmentation.Despite some progress,this task still encounters many issues due to the insufficient samples given,e.g.,incomplete object segmentation and inaccurate semantic discrimination.To tackle these issues,we first leverage part-whole relationships into the task of 3D point cloud semantic segmentation to capture semantic integrity,which is empowered by the dynamic capsule routing with the module of 3D Capsule Networks(CapsNets)in the embedding network.Concretely,the dynamic routing amalgamates geometric information of the 3D point cloud data to construct higher-level feature representations,which capture the relationships between object parts and their wholes.Secondly,we designed a multi-prototype enhancement module to enhance the prototype discriminability.Specifically,the single-prototype enhancement mechanism is expanded to the multi-prototype enhancement version for capturing rich semantics.Besides,the shot-correlation within the category is calculated via the interaction of different samples to enhance the intra-category similarity.Ablation studies prove that the involved part-whole relations and proposed multi-prototype enhancement module help to achieve complete object segmentation and improve semantic discrimination.Moreover,under the integration of these two modules,quantitative and qualitative experiments on two public benchmarks,including S3DIS and ScanNet,indicate the superior performance of the proposed framework on the task of 3D point cloud semantic segmentation,compared to some state-of-the-art methods.
基金the National Natural Science Foundation of China(Grant Number 62066013)Hainan Provincial Natural Science Foundation of China(Grant Numbers 622RC674 and 2019RC182).
文摘High-resolution remote sensing image segmentation is a challenging task. In urban remote sensing, the presenceof occlusions and shadows often results in blurred or invisible object boundaries, thereby increasing the difficultyof segmentation. In this paper, an improved network with a cross-region self-attention mechanism for multi-scalefeatures based onDeepLabv3+is designed to address the difficulties of small object segmentation and blurred targetedge segmentation. First,we use CrossFormer as the backbone feature extraction network to achieve the interactionbetween large- and small-scale features, and establish self-attention associations between features at both large andsmall scales to capture global contextual feature information. Next, an improved atrous spatial pyramid poolingmodule is introduced to establish multi-scale feature maps with large- and small-scale feature associations, andattention vectors are added in the channel direction to enable adaptive adjustment of multi-scale channel features.The proposed networkmodel is validated using the PotsdamandVaihingen datasets. The experimental results showthat, compared with existing techniques, the network model designed in this paper can extract and fuse multiscaleinformation, more clearly extract edge information and small-scale information, and segment boundariesmore smoothly. Experimental results on public datasets demonstrate the superiority of ourmethod compared withseveral state-of-the-art networks.
基金funded in part by the Key Project of Nature Science Research for Universities of Anhui Province of China(No.2022AH051720)in part by the Science and Technology Development Fund,Macao SAR(Grant Nos.0093/2022/A2,0076/2022/A2 and 0008/2022/AGJ)in part by the China University Industry-University-Research Collaborative Innovation Fund(No.2021FNA04017).
文摘This paper focuses on the effective utilization of data augmentation techniques for 3Dlidar point clouds to enhance the performance of neural network models.These point clouds,which represent spatial information through a collection of 3D coordinates,have found wide-ranging applications.Data augmentation has emerged as a potent solution to the challenges posed by limited labeled data and the need to enhance model generalization capabilities.Much of the existing research is devoted to crafting novel data augmentation methods specifically for 3D lidar point clouds.However,there has been a lack of focus on making the most of the numerous existing augmentation techniques.Addressing this deficiency,this research investigates the possibility of combining two fundamental data augmentation strategies.The paper introduces PolarMix andMix3D,two commonly employed augmentation techniques,and presents a new approach,named RandomFusion.Instead of using a fixed or predetermined combination of augmentation methods,RandomFusion randomly chooses one method from a pool of options for each instance or sample.This innovative data augmentation technique randomly augments each point in the point cloud with either PolarMix or Mix3D.The crux of this strategy is the random choice between PolarMix and Mix3Dfor the augmentation of each point within the point cloud data set.The results of the experiments conducted validate the efficacy of the RandomFusion strategy in enhancing the performance of neural network models for 3D lidar point cloud semantic segmentation tasks.This is achieved without compromising computational efficiency.By examining the potential of merging different augmentation techniques,the research contributes significantly to a more comprehensive understanding of how to utilize existing augmentation methods for 3D lidar point clouds.RandomFusion data augmentation technique offers a simple yet effective method to leverage the diversity of augmentation techniques and boost the robustness of models.The insights gained from this research can pave the way for future work aimed at developing more advanced and efficient data augmentation strategies for 3D lidar point cloud analysis.
文摘In cornfields,factors such as the similarity between corn seedlings and weeds and the blurring of plant edge details pose challenges to corn and weed segmentation.In addition,remote areas such as farmland are usually constrained by limited computational resources and limited collected data.Therefore,it becomes necessary to lighten the model to better adapt to complex cornfield scene,and make full use of the limited data information.In this paper,we propose an improved image segmentation algorithm based on unet.Firstly,the inverted residual structure is introduced into the contraction path to reduce the number of parameters in the training process and improve the feature extraction ability;secondly,the pyramid pooling module is introduced to enhance the network’s ability of acquiring contextual information as well as the ability of dealing with the small target loss problem;and lastly,Finally,to further enhance the segmentation capability of the model,the squeeze and excitation mechanism is introduced in the expansion path.We used images of corn seedlings collected in the field and publicly available corn weed datasets to evaluate the improved model.The improved model has a total parameter of 3.79 M and miou can achieve 87.9%.The fps on a single 3050 ti video card is about 58.9.The experimental results show that the network proposed in this paper can quickly segment corn weeds in a cornfield scenario with good segmentation accuracy.
文摘Computed Tomography(CT)is a commonly used technology in Printed Circuit Boards(PCB)non-destructive testing,and element segmentation of CT images is a key subsequent step.With the development of deep learning,researchers began to exploit the“pre-training and fine-tuning”training process for multi-element segmentation,reducing the time spent on manual annotation.However,the existing element segmentation model only focuses on the overall accuracy at the pixel level,ignoring whether the element connectivity relationship can be correctly identified.To this end,this paper proposes a PCB CT image element segmentation model optimizing the semantic perception of connectivity relationship(OSPC-seg).The overall training process adopts a“pre-training and fine-tuning”training process.A loss function that optimizes the semantic perception of circuit connectivity relationship(OSPC Loss)is designed from the aspect of alleviating the class imbalance problem and improving the correct connectivity rate.Also,the correct connectivity rate index(CCR)is proposed to evaluate the model’s connectivity relationship recognition capabilities.Experiments show that mIoU and CCR of OSPC-seg on our datasets are 90.1%and 97.0%,improved by 1.5%and 1.6%respectively compared with the baseline model.From visualization results,it can be seen that the segmentation performance of connection positions is significantly improved,which also demonstrates the effectiveness of OSPC-seg.
文摘Breast Arterial Calcification(BAC)is a mammographic decision dissimilar to cancer and commonly observed in elderly women.Thus identifying BAC could provide an expense,and be inaccurate.Recently Deep Learning(DL)methods have been introduced for automatic BAC detection and quantification with increased accuracy.Previously,classification with deep learning had reached higher efficiency,but designing the structure of DL proved to be an extremely challenging task due to overfitting models.It also is not able to capture the patterns and irregularities presented in the images.To solve the overfitting problem,an optimal feature set has been formed by Enhanced Wolf Pack Algorithm(EWPA),and their irregularities are identified by Dense-kUNet segmentation.In this paper,Dense-kUNet for segmentation and optimal feature has been introduced for classification(severe,mild,light)that integrates DenseUNet and kU-Net.Longer bound links exist among adjacent modules,allowing relatively rough data to be sent to the following component and assisting the system in finding higher qualities.The major contribution of the work is to design the best features selected by Enhanced Wolf Pack Algorithm(EWPA),and Modified Support Vector Machine(MSVM)based learning for classification.k-Dense-UNet is introduced which combines the procedure of Dense-UNet and kU-Net for image segmentation.Longer bound associations occur among nearby sections,allowing relatively granular data to be sent to the next subsystem and benefiting the system in recognizing smaller characteristics.The proposed techniques and the performance are tested using several types of analysis techniques 826 filled digitized mammography.The proposed method achieved the highest precision,recall,F-measure,and accuracy of 84.4333%,84.5333%,84.4833%,and 86.8667%when compared to other methods on the Digital Database for Screening Mammography(DDSM).
基金supported in part by the Tianjin Technology Innovation Guidance Special Fund Project under Grant No.21YDTPJC00850in part by the National Natural Science Foundation of China under Grant No.41906161in part by the Natural Science Foundation of Tianjin under Grant No.21JCQNJC00650。
文摘With the development of underwater sonar detection technology,simultaneous localization and mapping(SLAM)approach has attracted much attention in underwater navigation field in recent years.But the weak detection ability of a single vehicle limits the SLAM performance in wide areas.Thereby,cooperative SLAM using multiple vehicles has become an important research direction.The key factor of cooperative SLAM is timely and efficient sonar image transmission among underwater vehicles.However,the limited bandwidth of underwater acoustic channels contradicts a large amount of sonar image data.It is essential to compress the images before transmission.Recently,deep neural networks have great value in image compression by virtue of the powerful learning ability of neural networks,but the existing sonar image compression methods based on neural network usually focus on the pixel-level information without the semantic-level information.In this paper,we propose a novel underwater acoustic transmission scheme called UAT-SSIC that includes semantic segmentation-based sonar image compression(SSIC)framework and the joint source-channel codec,to improve the accuracy of the semantic information of the reconstructed sonar image at the receiver.The SSIC framework consists of Auto-Encoder structure-based sonar image compression network,which is measured by a semantic segmentation network's residual.Considering that sonar images have the characteristics of blurred target edges,the semantic segmentation network used a special dilated convolution neural network(DiCNN)to enhance segmentation accuracy by expanding the range of receptive fields.The joint source-channel codec with unequal error protection is proposed that adjusts the power level of the transmitted data,which deal with sonar image transmission error caused by the serious underwater acoustic channel.Experiment results demonstrate that our method preserves more semantic information,with advantages over existing methods at the same compression ratio.It also improves the error tolerance and packet loss resistance of transmission.
文摘As the field of autonomous driving evolves, real-time semantic segmentation has become a crucial part of computer vision tasks. However, most existing methods use lightweight convolution to reduce the computational effort, resulting in lower accuracy. To address this problem, we construct TBANet, a network with an encoder-decoder structure for efficient feature extraction. In the encoder part, the TBA module is designed to extract details and the ETBA module is used to learn semantic representations in a high-dimensional space. In the decoder part, we design a combination of multiple upsampling methods to aggregate features with less computational overhead. We validate the efficiency of TBANet on the Cityscapes dataset. It achieves 75.1% mean Intersection over Union(mIoU) with only 2.07 million parameters and can reach 90.3 Frames Per Second(FPS).
基金supported in part by the National Key Research and Development Program of China(2018YFB1305002)the National Natural Science Foundation of China(62006256)+2 种基金the Postdoctoral Science Foundation of China(2020M683050)the Key Research and Development Program of Guangzhou(202007050002)the Fundamental Research Funds for the Central Universities(67000-31610134)。
文摘Because pixel values of foggy images are irregularly higher than those of images captured in normal weather(clear images),it is difficult to extract and express their texture.No method has previously been developed to directly explore the relationship between foggy images and semantic segmentation images.We investigated this relationship and propose a generative adversarial network(GAN)for foggy image semantic segmentation(FISS GAN),which contains two parts:an edge GAN and a semantic segmentation GAN.The edge GAN is designed to generate edge information from foggy images to provide auxiliary information to the semantic segmentation GAN.The semantic segmentation GAN is designed to extract and express the texture of foggy images and generate semantic segmentation images.Experiments on foggy cityscapes datasets and foggy driving datasets indicated that FISS GAN achieved state-of-the-art performance.
基金Project(61806107)supported by the National Natural Science Foundation of ChinaProject supported by the Shandong Key Laboratory of Wisdom Mine Information Technology,ChinaProject supported by the Opening Project of State Key Laboratory of Digital Publishing Technology,China。
文摘Semantic segmentation is a crucial step for document understanding.In this paper,an NVIDIA Jetson Nano-based platform is applied for implementing semantic segmentation for teaching artificial intelligence concepts and programming.To extract semantic structures from document images,we present an end-to-end dilated convolution network architecture.Dilated convolutions have well-known advantages for extracting multi-scale context information without losing spatial resolution.Our model utilizes dilated convolutions with residual network to represent the image features and predicting pixel labels.The convolution part works as feature extractor to obtain multidimensional and hierarchical image features.The consecutive deconvolution is used for producing full resolution segmentation prediction.The probability of each pixel decides its predefined semantic class label.To understand segmentation granularity,we compare performances at three different levels.From fine grained class to coarse class levels,the proposed dilated convolution network architecture is evaluated on three document datasets.The experimental results have shown that both semantic data distribution imbalance and network depth are import factors that influence the document’s semantic segmentation performances.The research is aimed at offering an education resource for teaching artificial intelligence concepts and techniques.
基金National Natural Science Foundation of China(Nos.61941109,62061023)Distinguished Young Scholars of Gansu Province of China(No.21JR7RA345)。
文摘In the study of automatic driving,understanding the road scene is a key to improve driving safety.The semantic segmentation method could divide the image into different areas associated with semantic categories in accordance with the pixel level,so as to help vehicles to perceive and obtain the surrounding road environment information,which would improve driving safety.Deeplabv3+is the current popular semantic segmentation model.There are phenomena that small targets are missed and similar objects are easily misjudged during its semantic segmentation tasks,which leads to rough segmentation boundary and reduces semantic accuracy.This study focuses on the issue,based on the Deeplabv3+network structure and combined with the attention mechanism,to increase the weight of the segmentation area,and then proposes an improved Deeplabv3+fusion attention mechanism for road scene semantic segmentation method.First,a group of parallel position attention module and channel attention module are introduced on the Deeplabv3+encoding end to capture more spatial context information and high-level semantic information.Then,an attention mechanism is introduced to restore the spatial detail information,and the data shall be normalized in order to accelerate the convergence speed of the model at the decoding end.The effects of model segmentation with different attention-introducing mechanisms are compared and tested on CamVid and Cityscapes datasets.The experimental results show that the mean Intersection over Unons of the improved model segmentation accuracies on the two datasets are boosted by 6.88%and 2.58%,respectively,which is better than using Deeplabv3+.This method does not significantly increase the amount of network calculation and complexity,and has a good balance of speed and accuracy.
基金the National Research Foundation of Korea(NRF,www.nrf.re.kr)grant funded by the Korean government(MSIT,www.msit.go.kr)(No.2018R1A2B6009188)(received by W.K.Loh).
文摘Multiple ocular region segmentation plays an important role in different applications such as biometrics,liveness detection,healthcare,and gaze estimation.Typically,segmentation techniques focus on a single region of the eye at a time.Despite the number of obvious advantages,very limited research has focused on multiple regions of the eye.Similarly,accurate segmentation of multiple eye regions is necessary in challenging scenarios involving blur,ghost effects low resolution,off-angles,and unusual glints.Currently,the available segmentation methods cannot address these constraints.In this paper,to address the accurate segmentation of multiple eye regions in unconstrainted scenarios,a lightweight outer residual encoder-decoder network suitable for various sensor images is proposed.The proposed method can determine the true boundaries of the eye regions from inferior-quality images using the high-frequency information flow from the outer residual encoder-decoder deep convolutional neural network(called ORED-Net).Moreover,the proposed ORED-Net model does not improve the performance based on the complexity,number of parameters or network depth.The proposed network is considerably lighter than previous state-of-theart models.Comprehensive experiments were performed,and optimal performance was achieved using SBVPI and UBIRIS.v2 datasets containing images of the eye region.The simulation results obtained using the proposed OREDNet,with the mean intersection over union score(mIoU)of 89.25 and 85.12 on the challenging SBVPI and UBIRIS.v2 datasets,respectively.
基金the National Natural Science Foundation of China(61772386)Joint fund project(nsfc-guangdong big data science center project),project number:U1611262,Hubei University of Science and Technology,Master of Engineering,special construction,project number:2018-19GZ01,Hubei University of Science and Technology Teaching Reform Project,project number:2018-XB-023,S201910927028.
文摘This paper proposes a novel method of lane detection,which adopts VGG16 as the basis of convolutional neural network to extract lane line features by cavity convolution,wherein the lane lines are divided into dotted lines and solid lines.Expanding the field of experience through hollow convolution,the full connection layer of the network is discarded,the last largest pooling layer of the VGG16 network is removed,and the processing of the last three convolution layers is replaced by hole convolution.At the same time,CNN adopts the encoder and decoder structure mode,and uses the index function of the maximum pooling layer in the decoder part to upsample the encoder in a counter-pooling manner,realizing semantic segmentation.And combined with the instance segmentation,and finally through the fitting to achieve the detection of the lane line.In addition,the currently disclosed lane line data sets are relatively small,and there is no distinction between lane solid lines and dashed lines.To this end,our work made a lane line data set for the lane virtual and real identification,and based on the proposed algorithm effective verification of the data set achieved by the increased segmentation.The final test shows that the proposed method has a good balance between lane detection speed and accuracy,which has good robustness.
文摘Buildings undergo various kinds of structural damage during earthquakes,and damage detection and functional assessment of these structures in the aftermath of the events have been challenging issues.Under these circumstances,computer vision techniques offer a promising solution by automating the inspection process.This study presents an effective methodology for automatic structural components and damage detection using unmanned aerial vehicle(UAV)images of damaged buildings.Two types of neural network architectures are considered for appropriate feature extractions in different task detections.The feature pyramid network(FPN)is employed for crack,spall,rebar,and component damage segmentation,while the UNet++network is utilized for the damage state.For network training and validation,a total of 3805 original images of size 1920×1080 pixels are processed by the proposed method and reduced the image pixels.From the FPN,the achieved highest intersection over unions(IoUs)were 0.59,0.93,0.42,and 0.99 for crack,spall,rebar,and components,respectively.These predicted labels were found in close agreement with the labels.Similarly,the UNet++recognized the semantic information and damage state with an IoU of 0.72.This demonstrated the applicability of the proposed method for automated post-earthquake building inspection process accurately without information loss from the original images.