摘要
准确高效的点云分类在场景理解和数字孪生城市建设等任务中发挥着关键作用。利用单一点、体素等视觉结构数据的点云分类方法容易丢失关键几何特征;融合多种结构数据的点云分类方法学习到不同数据的多层次、多尺度特征,但难以平衡不同数据之间的差异,降低了点云分类的准确性。因此,提出一个基于点-体素一致性约束的点云分类网络(PVCC-Net),用于准确分割城市场景中的不同尺寸地物。PVCC-Net采用双分支U-Net结构,体素和点分支分别负责提取粗粒度和细粒度特征,并利用点-体素一致性约束模块对齐粗细粒度特征,以减小不同粒度特征的分布差异。然后,所提网络采用点-体素自注意力机制自适应融合聚合后的粗粒度和细粒度特征,进而提升点云全局特征表达。引入Toronto3D、Semantic3D和SensatUrban三个城市场景点云数据集对PVCC-Net进行性能评估。结果显示,PVCC-Net分别取得了97.97%、93.80%和93.00%的总体精度(OA),以及82.92%、75.70%和55.40%的平均交并比(mIoU)。对比实验结果表明,相比基线方法,所提方法可以有效提升对复杂城市场景点云的分类性能,且获得更优的分类结果。
Objective Accurate and efficient point cloud classification plays a vital role in tasks such as scene understanding and digital twin city classification. Traditional classification methods manually extract features and construct discriminative models to classify point clouds. However, with the increasing density of point cloud acquisition and growth in data volume, it is difficult for traditional methods to achieve accurate and efficient point cloud classification. Recently developed deep learning-based point cloud processing methods promote the development of point cloud classification. Among them, methods using visual structural data, such as unique points or voxels, are prone to losing critical geometric features, whereas methods fusing multiple structural data can learn multilevel and multiscale features of different data. However, it is difficult to balance the differences between various data, which reduces the accuracy of point cloud classification. In addition, LiDAR point clouds acquired from complex urban scenes contain large amounts of noise and outliers that are difficult to process. These challenges have become a problem to be solved in current point cloud classification research.Methods To address these problems, a point-voxel consistency constraint network(PVCC-Net) is proposed to accurately segment point clouds with different sizes in urban scenes. The overall structure of PVCC-Net is designed with a dual-branch U-Net encodingdecoding structure. First, the point and voxel branches extract features from different receptive fields. The point branch extracts point-level geometric semantic features through a local feature aggregation(LFA) module, which helps reduce the effects of feature redundancy and noise. The voxel branch stepwise expands the receptive field by using a convolutional network to extract voxel features at different levels. The voxel format is regular and ordered in the memory, which maintains the continuity of spatial information and compensates for the shortcomings of point clouds. The point fine-grained feature and voxel coarse-grained feature branches cover a range of spatial scopes with different resolutions, thus combining this multilevel contextual information to enhance feature extraction capabilities. The point-voxel consistency constraint(PV-CC) module adequately integrates fine-grained and coarsegrained features and enhances the adaptive ability between point clouds and voxels by constraining the distances between feature branches of different granularities in the same layer of the network, which enables the model to produce more stable prediction results.Subsequently, the point-voxel self-attention(PV-SA) mechanism sufficiently fuses point and voxel features while enhancing the expression of the global features. Finally, the performance of the network is further improved via weighted cross-entropy and Lovasz loss functions, which result in accurate and efficient point cloud classification in urban scenes.Results and Discussion The proposed PVCC-Net is trained and evaluated on three urban scene datasets, namely, Toronto3D,Semantic3D, and SensatUrban, with performances of 97.97%, 93.80%, and 93.00% in terms of overall accuracy(OA) and 82.92%, 75.70%, and 55.40% in terms of mean intersection of union(mIoU), respectively. All experimental results outperform the Baseline network(Table 2, Fig. 6, and Fig. 9). In addition, PVCC-Net achieves competitive experimental results compared with other state-of-the-art methods, which fully demonstrates its strong generalizability(Tables 3 and 4). Notably, PVCC-Net not only maintains the integrity of the internal structure of the categories but also makes the segmentation boundaries between different categories clear and accurate(Figs. 4, 7, and 10). Comparative experimental and ablation studies demonstrate that different granular features have different semantic representation capabilities. The combination of fine-grained point features and coarse-grained voxel features can significantly improve the accuracy of point cloud classification, and the consistency constraint reduces the differences between different granularity features by minimizing the feature distance, thereby improving the stability and robustness of the model(Table 5). However, the complexity analysis indicates a higher number of parameters and FLOPs in PVCC-Net, mainly because the convolution and deconvolution operations in the voxel branch incurred considerable computational costs. However, the Latency is close to that of the point-based and point-voxel fusion methods(Table 6).Conclusions In this study, PVCC-Net is used for the LiDAR point cloud classification of urban scenes. The network first aligns the distribution of point fine-grained features and voxel coarse-grained features through a point-voxel consistency constraint module and then uses a point-voxel self-attention mechanism to capture long-distance context information, enhancing the global feature representation, and finally alleviating the imbalance of point cloud categories in the urban scene via the square-root-weighted crossentropy and Lovasz loss functions for accurate point cloud classification. On the Toronto3D, Semantic3D, and SensatUrban datasets, PVCC-Net improves the mIoU by 3.44 percentage points, 0.90 percentage points, and 2.30 percentage points,respectively, compared with RandLA-Net. In addition, the classification performance of PVCC-Net is comparable to that of other advanced methods. The results of comparative experiments and ablation studies show that deeply fused point fine-grained features and voxel coarse-grained features can enhance the capability of the model to extract complex features in urban scenes and further constrain point and voxel features to maintain the consistency of the feature distributions and improve the stability of the model prediction results. However, PVCC-Net has a higher number of parameters and computational cost. Therefore, in future research, we will explore the synergistic and complementary effects of points and voxels in a lightweight scene point cloud classification task.
作者
李虎辰
管海燕
雷相达
秦楠楠
倪欢
Li Huchen;Guan Haiyan;Lei Xiangda;Qin Nannan;Ni Huan(School of Remote Sensing&Geomatics Engineering,Nanjing University of Information Science&Technology,Nanjing 210044,Jiangsu,China)
出处
《中国激光》
EI
CAS
CSCD
北大核心
2024年第13期243-256,共14页
Chinese Journal of Lasers
基金
自然资源部国土卫星遥感应用重点实验室开放基金资助(KLSMNR-G202305)
国家自然科学基金(41971414,42371447)
江苏省研究生科研与实践创新计划(KYCX22_1212)。
关键词
遥感
点云分类
体素
一致性约束
自注意力机制
城市场景
remote sensing
point cloud classification
voxel
consistency constraint
selfattention mechanism
urban scene