在深度学习技术的推动下,基于二维图像的三维目标重建得到迅速发展。常用的方法是先从二维图像中提取出特征流,然后用特征流引导解码器估计三维目标结构。三维特征流解耦(Three-dimensional Disentangled Attribute Flow, 3DAttriFlow)...在深度学习技术的推动下,基于二维图像的三维目标重建得到迅速发展。常用的方法是先从二维图像中提取出特征流,然后用特征流引导解码器估计三维目标结构。三维特征流解耦(Three-dimensional Disentangled Attribute Flow, 3DAttriFlow)模型能将提取出的特征流进行解耦,并用明确的几何和语义特征引导三维目标重建。然而,3DAttriFlow仅适用于基于单视图像的三维重建,单视图中存在遮挡部分信息缺失,导致重建性能仍有待提升。本文将3DAttriFlow推广至多视图像的三维目标重建,分别从特征提取和多视图像的特征融合两个方面对模型进行改进。特征提取方面,在原始骨干网络ResNet18中引入通道注意力机制,以突出重要通道特征;多视图特征融合方面,通过注意力模块将多个视图的特征进行融合,以获取更为完整和丰富的目标特征。ShapeNet数据子集上的实验结果表明,与原始的3DAttriFlow模型相比,本文的改进模型能获得更好的三维重建质量。Driven by deep learning technology, three-dimensional (3D) object reconstruction based on two-dimensional (2D) images has been developed rapidly. The commonly used method is to extract attribute flow from 2D images and then use them to estimate 3D object structure. The three-dimensional disentangled attribute flow (3DAttriFlow) model can disentangle the attribute flows and guide 3D object reconstruction with explicit geometric and semantic attributes. However, 3DAttriFlow is only suitable for 3D reconstruction based on a single-view image, and there is missing information in the occluded parts of the single-view image, which leads to the need for improvement in the reconstruction performance. In this paper, we extend 3DAttriFlow to 3D object reconstruction from multi-view images and improve the model from two aspects: feature extraction and fusion of features from multi-view images. In terms of feature extraction, the channel attention mechanism is introduced into the original backbone network ResNet18 to highlight important features. In the aspect of multi-view feature fusion, the attention module is used to fuse the features of multiple views to obtain more abundant and complete features. The experimental results on the ShapeNet data subset show that compared with the original 3DAttriFlow model, the improved model in this paper can achieve better 3D reconstruction quality.展开更多
文摘在深度学习技术的推动下,基于二维图像的三维目标重建得到迅速发展。常用的方法是先从二维图像中提取出特征流,然后用特征流引导解码器估计三维目标结构。三维特征流解耦(Three-dimensional Disentangled Attribute Flow, 3DAttriFlow)模型能将提取出的特征流进行解耦,并用明确的几何和语义特征引导三维目标重建。然而,3DAttriFlow仅适用于基于单视图像的三维重建,单视图中存在遮挡部分信息缺失,导致重建性能仍有待提升。本文将3DAttriFlow推广至多视图像的三维目标重建,分别从特征提取和多视图像的特征融合两个方面对模型进行改进。特征提取方面,在原始骨干网络ResNet18中引入通道注意力机制,以突出重要通道特征;多视图特征融合方面,通过注意力模块将多个视图的特征进行融合,以获取更为完整和丰富的目标特征。ShapeNet数据子集上的实验结果表明,与原始的3DAttriFlow模型相比,本文的改进模型能获得更好的三维重建质量。Driven by deep learning technology, three-dimensional (3D) object reconstruction based on two-dimensional (2D) images has been developed rapidly. The commonly used method is to extract attribute flow from 2D images and then use them to estimate 3D object structure. The three-dimensional disentangled attribute flow (3DAttriFlow) model can disentangle the attribute flows and guide 3D object reconstruction with explicit geometric and semantic attributes. However, 3DAttriFlow is only suitable for 3D reconstruction based on a single-view image, and there is missing information in the occluded parts of the single-view image, which leads to the need for improvement in the reconstruction performance. In this paper, we extend 3DAttriFlow to 3D object reconstruction from multi-view images and improve the model from two aspects: feature extraction and fusion of features from multi-view images. In terms of feature extraction, the channel attention mechanism is introduced into the original backbone network ResNet18 to highlight important features. In the aspect of multi-view feature fusion, the attention module is used to fuse the features of multiple views to obtain more abundant and complete features. The experimental results on the ShapeNet data subset show that compared with the original 3DAttriFlow model, the improved model in this paper can achieve better 3D reconstruction quality.