摘要
Traditional models for semantic segmentation in point clouds primarily focus on smaller scales.However,in real-world applications,point clouds often exhibit larger scales,leading to heavy computational and memory requirements.The key to handling large-scale point clouds lies in leveraging random sampling,which offers higher computational efficiency and lower memory consumption compared to other sampling methods.Nevertheless,the use of random sampling can potentially result in the loss of crucial points during the encoding stage.To address these issues,this paper proposes cross-fusion self-attention network(CFSA-Net),a lightweight and efficient network architecture specifically designed for directly processing large-scale point clouds.At the core of this network is the incorporation of random sampling alongside a local feature extraction module based on cross-fusion self-attention(CFSA).This module effectively integrates long-range contextual dependencies between points by employing hierarchical position encoding(HPC).Furthermore,it enhances the interaction between each point’s coordinates and feature information through cross-fusion self-attention pooling,enabling the acquisition of more comprehensive geometric information.Finally,a residual optimization(RO)structure is introduced to extend the receptive field of individual points by stacking hierarchical position encoding and cross-fusion self-attention pooling,thereby reducing the impact of information loss caused by random sampling.Experimental results on the Stanford Large-Scale 3D Indoor Spaces(S3DIS),Semantic3D,and SemanticKITTI datasets demonstrate the superiority of this algorithm over advanced approaches such as RandLA-Net and KPConv.These findings underscore the excellent performance of CFSA-Net in large-scale 3D semantic segmentation.
基金
funded by the National Natural Science Foundation of China Youth Project(61603127).