Traditional models for semantic segmentation in point clouds primarily focus on smaller scales.However,in real-world applications,point clouds often exhibit larger scales,leading to heavy computational and memory requ...Traditional models for semantic segmentation in point clouds primarily focus on smaller scales.However,in real-world applications,point clouds often exhibit larger scales,leading to heavy computational and memory requirements.The key to handling large-scale point clouds lies in leveraging random sampling,which offers higher computational efficiency and lower memory consumption compared to other sampling methods.Nevertheless,the use of random sampling can potentially result in the loss of crucial points during the encoding stage.To address these issues,this paper proposes cross-fusion self-attention network(CFSA-Net),a lightweight and efficient network architecture specifically designed for directly processing large-scale point clouds.At the core of this network is the incorporation of random sampling alongside a local feature extraction module based on cross-fusion self-attention(CFSA).This module effectively integrates long-range contextual dependencies between points by employing hierarchical position encoding(HPC).Furthermore,it enhances the interaction between each point’s coordinates and feature information through cross-fusion self-attention pooling,enabling the acquisition of more comprehensive geometric information.Finally,a residual optimization(RO)structure is introduced to extend the receptive field of individual points by stacking hierarchical position encoding and cross-fusion self-attention pooling,thereby reducing the impact of information loss caused by random sampling.Experimental results on the Stanford Large-Scale 3D Indoor Spaces(S3DIS),Semantic3D,and SemanticKITTI datasets demonstrate the superiority of this algorithm over advanced approaches such as RandLA-Net and KPConv.These findings underscore the excellent performance of CFSA-Net in large-scale 3D semantic segmentation.展开更多
This paper proposes a new network structure,namely the ProNet network.Retinal medical image segmentation can help clinical diagnosis of related eye diseases and is essential for subsequent rational treatment.The basel...This paper proposes a new network structure,namely the ProNet network.Retinal medical image segmentation can help clinical diagnosis of related eye diseases and is essential for subsequent rational treatment.The baseline model of the ProNet network is UperNet(Unified perceptual parsing Network),and the backbone network is ConvNext(Convolutional Network).A network structure based on depth-separable convolution and 1×1 convolution is used,which has good performance and robustness.We further optimise ProNet mainly in two aspects.One is data enhancement using increased noise and slight angle rotation,which can significantly increase the diversity of data and help the model better learn the patterns and features of the data and improve the model’s performance.Meanwhile,it can effectively expand the training data set,reduce the influence of noise and abnormal data in the data set on the model,and improve the accuracy and reliability of the model.Another is the loss function aspect,and we finally use the focal loss function.The focal loss function is well suited for complex tasks such as object detection.The function will penalise the loss carried by samples that the model misclassifies,thus enabling better training of the model to avoid these errors while solving the category imbalance problem as a way to improve image segmentation density and segmentation accuracy.From the experimental results,the evaluation metrics mIoU(mean Intersection over Union)enhanced by 4.47%,and mDice enhanced by 2.92% compared to the baseline network.Better generalization effects and more accurate image segmentation are achieved.展开更多
基金funded by the National Natural Science Foundation of China Youth Project(61603127).
文摘Traditional models for semantic segmentation in point clouds primarily focus on smaller scales.However,in real-world applications,point clouds often exhibit larger scales,leading to heavy computational and memory requirements.The key to handling large-scale point clouds lies in leveraging random sampling,which offers higher computational efficiency and lower memory consumption compared to other sampling methods.Nevertheless,the use of random sampling can potentially result in the loss of crucial points during the encoding stage.To address these issues,this paper proposes cross-fusion self-attention network(CFSA-Net),a lightweight and efficient network architecture specifically designed for directly processing large-scale point clouds.At the core of this network is the incorporation of random sampling alongside a local feature extraction module based on cross-fusion self-attention(CFSA).This module effectively integrates long-range contextual dependencies between points by employing hierarchical position encoding(HPC).Furthermore,it enhances the interaction between each point’s coordinates and feature information through cross-fusion self-attention pooling,enabling the acquisition of more comprehensive geometric information.Finally,a residual optimization(RO)structure is introduced to extend the receptive field of individual points by stacking hierarchical position encoding and cross-fusion self-attention pooling,thereby reducing the impact of information loss caused by random sampling.Experimental results on the Stanford Large-Scale 3D Indoor Spaces(S3DIS),Semantic3D,and SemanticKITTI datasets demonstrate the superiority of this algorithm over advanced approaches such as RandLA-Net and KPConv.These findings underscore the excellent performance of CFSA-Net in large-scale 3D semantic segmentation.
文摘This paper proposes a new network structure,namely the ProNet network.Retinal medical image segmentation can help clinical diagnosis of related eye diseases and is essential for subsequent rational treatment.The baseline model of the ProNet network is UperNet(Unified perceptual parsing Network),and the backbone network is ConvNext(Convolutional Network).A network structure based on depth-separable convolution and 1×1 convolution is used,which has good performance and robustness.We further optimise ProNet mainly in two aspects.One is data enhancement using increased noise and slight angle rotation,which can significantly increase the diversity of data and help the model better learn the patterns and features of the data and improve the model’s performance.Meanwhile,it can effectively expand the training data set,reduce the influence of noise and abnormal data in the data set on the model,and improve the accuracy and reliability of the model.Another is the loss function aspect,and we finally use the focal loss function.The focal loss function is well suited for complex tasks such as object detection.The function will penalise the loss carried by samples that the model misclassifies,thus enabling better training of the model to avoid these errors while solving the category imbalance problem as a way to improve image segmentation density and segmentation accuracy.From the experimental results,the evaluation metrics mIoU(mean Intersection over Union)enhanced by 4.47%,and mDice enhanced by 2.92% compared to the baseline network.Better generalization effects and more accurate image segmentation are achieved.